Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...

35
Deep Learning Lecture 1 – Introduction, linear models Niklas Wahlstr¨ om Division of Systems and Control Department of Information Technology Uppsala University [email protected] www.it.uu.se/katalog/nikwa778 1 / 35 [email protected] DL 2019 - Introduction, linear models

Transcript of Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...

Page 1: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Deep Learning

Lecture 1 – Introduction, linear models

Niklas WahlstromDivision of Systems and ControlDepartment of Information TechnologyUppsala University

[email protected]/katalog/nikwa778

1 / 35 [email protected] DL 2019 - Introduction, linear models

Page 2: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

What is the course about?

2 / 35 [email protected] DL 2019 - Introduction, linear models

Page 3: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Machine learning

”Machine learning is about learning, reasoningand acting based on data.”

“It is one of today’s most rapidly growing technical fields, lying atthe intersection of computer science and statistics, and at the core

of artificial intelligence and data science.”

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521:452-459, 2015.

Jordan, M. I. and Mitchell, T. M. Machine Learning: Trends, perspectives and prospects. Science,349(6245):255-260, 2015.

3 / 35 [email protected] DL 2019 - Introduction, linear models

Page 4: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Deep learning

“Deep learning allows computational models that are composed ofmultiple processing layers to learn representations of data with

multiple levels of abstraction.”

LeCun, Y., Bengio, Y. and Hinton, G. Deep learning. Nature 521:436-444, 2015.

Example: Image classification

Input: pixels of an imageOutput: object identityEach hidden layer extractsincreasingly abstract features.Zeiler, M. D. and Fergus, R. Visualizing andunderstanding convolutional networks

Computer Vision - ECCV (2014).

Image removed in handout version due to copyrightc©Springer International Publishing

4 / 35 [email protected] DL 2019 - Introduction, linear models

Page 5: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Cancer diagnosis

Systems for detectingcell divisions (mitosis)in histology images canbe used to improve (orautomate) cancerdiagnosis.

Image removed in handout version due to copyrightc©Springer-Verlag Berlin Heidelberg 2013

• Learn a model with

input: RBG histology image (pixel values)output: number and locations (in the image) of mitosisdetections

• Training data: Histology images labeled by experts.

D. C. Ciresan, A. Giusti, L. M. Gambardella and J. Schmidhuber. Mitosis Detection in Breast Cancer HistologyImages with Deep Neural Networks. In Medical Image Computing and Computer Assisted Intervention, 411-418,2013.

5 / 35 [email protected] DL 2019 - Introduction, linear models

Page 6: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Colorization (I/II)

Image removed in handout version due to copyrightc©Springer

Task: Colorize gray-scale photos.

• Learn a model with

input: gray-scale pixel values.output: color pixel values.

R. Zhang, P. Isola, and A. A. Efros. Colorful Image Colorization. ECCV, 2016.

6 / 35 [email protected] DL 2019 - Introduction, linear models

Page 7: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Colorization (II/II)

Image removed in handout version due to copyrightc©Springer

Model applied to legacy grayscale photos.

7 / 35 [email protected] DL 2019 - Introduction, linear models

Page 8: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Alpha Go zero

• Input: State of the game (19× 19 grid,either black, white or blank)

• Output: Probability for the current playerto win the game

+ reinforcement learning

Silver et al. Mastering the game of Go with deep neuralnetworks and tree search, Nature 529, 484–489, 2016.

• Input: Same

• Output: Probability for the current playerto win the game and what move to make

Silver et al. Mastering the game of Go without humanknowledge, Nature 550, 354–359, 2017.

Silver et al. A general reinforcement learning algorithmthat masters chess, shogi, and Go through self-play,Science, 362(6419): 1140–1144, 2018.

8 / 35 [email protected] DL 2019 - Introduction, linear models

Page 9: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Higgs Machine Learning Challenge

HiggsML: Crowd-sourcing initiative by CERN (hosted at Kaggle)

• Separate H → ττ from background noise.

• Learn a model with

- input: 30-dimensional vector of“features” recorded during theexperiment.

- output: “signal” or “background”

• Deep Learning methods among the winningmethods.

C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B. Kegl and D. Rousseau. The Higgs boson machinelearning challenge. NIPS 2014 Workshop on High-energy Physics and Machine Learning, 2014.

9 / 35 [email protected] DL 2019 - Introduction, linear models

Page 10: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) Generative models

Novel generation of painting

• Learn a model with

- input: paintings- output: style, genre,

artist, year, etc.

• Used a generative model tocreate artificial paintings

• More about generativemodels in lecture 9

K. Jones and D. Bonafilia, A. Danyluk GANGogh: Creating Art with GANs. 2017https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1

10 / 35 [email protected] DL 2019 - Introduction, linear models

Page 11: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Course information

11 / 35 [email protected] DL 2019 - Introduction, linear models

Page 12: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Lecture outline

1. Introduction2. Feed farward neural networks3. Optimization4. Convolutional neural networks I5. Convolutional neural networks II6. Over-/underfitting, bias-variance trade-off7. Regularization in deep learning8. Variational inference9. Variational autoencoders10. Summary and guest lecture

12 / 35 [email protected] DL 2019 - Introduction, linear models

Page 13: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Hand-in assignments

4 hand-in assignments (HAs):

• Covers (mainly) implementation aspects of deep learning.

• Deadlines for all HAs available on course homepage.

• Everyone has 7 joker days, which can be used with any HA.

• You are encouraged to collaborate...

• ... but you should write submit your own report and code.

Optional helpdesks are scheduled after each lecture.

13 / 35 [email protected] DL 2019 - Introduction, linear models

Page 14: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Optional project

• After the course you are encouraged to carry out a project

• Preferably 1-4 students in each team

• The projects must include real data

• Awarded 3hp extra

• Conducted during summer, see course homepage for the dates

14 / 35 [email protected] DL 2019 - Introduction, linear models

Page 15: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Course literature

• Ian Goodfellow, Yoshua Bengio and Aaron Courville DeepLearning, MIT Press, 2016. www.deeplearningbook.org

We will not follow the book strictly though. Another greatresource is

• Michael A. Nielson Neural Networks and Deep LearningDeterminiation Press, 2015.neuralnetworksanddeeplearning.com

• First lectures are well covered by lecture notes linked from thecourse homepage.

All of them available online.

15 / 35 [email protected] DL 2019 - Introduction, linear models

Page 16: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Who are we?

Teachers involved in the course (in approximate order ofappearance):

NiklasWahlstrom

Room:2319

ThomasSchon

Room:2209

JoakimLindblad

Room:2141

JalilTaghia

Room:2321

FredrikGustafsson

Room:2303

NicolasPielawski

Room:xxxx

AnindyaGupta

Room:xxxx︸ ︷︷ ︸

Lecturers

︸ ︷︷ ︸Teaching assistants

All room numbers are at ITC Polacksbacken.You can reach us by email: <firstname.lastname>@it.uu.se.

16 / 35 [email protected] DL 2019 - Introduction, linear models

Page 17: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Who are you?

17 / 35 [email protected] DL 2019 - Introduction, linear models

Page 18: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Linear regression and classification

18 / 35 [email protected] DL 2019 - Introduction, linear models

Page 19: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Supervised machine learning

Methods for automatically learning (training, estimating, . . . )a model for the relationship between

• the input x, and the

• the output y

from observed training data

T def= {(x1, y1), (x2, y2), . . . , (xn, yn)}.

19 / 35 [email protected] DL 2019 - Introduction, linear models

Page 20: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Regression vs. classification

We will distinguish between two types of problems:regression and classification

Regression is when the output y is quantitative, e.g.

• Climate models (y = “increase in global temperature”)

• Economic models (y = “change in GDP”)

Classification is when the output y is qualitative, e.g.

• Spam filters (y ∈ {spam, good email})• Diagnosis systems (y ∈ {ALL, AML, CLL, CML, no leukemia})• Fingerprint verification (y ∈ {match, no match})

20 / 35 [email protected] DL 2019 - Introduction, linear models

Page 21: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) A regression problem

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

Data

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)–Input

Dis

tan

ce(f

eet)

–O

utp

ut

Linear model

Data (xi, yi)

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)–Input

Dis

tan

ce(f

eet)

–O

utp

ut

Linear model

Data (xi, yi)

Predictions (x?, y?)

Consider a linear model to explain the data

y = wx+ b

21 / 35 [email protected] DL 2019 - Introduction, linear models

Page 22: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

What is a good model?

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

A linear modelAnother linear modelA third linear model

A fourth linear modelTraining data T

22 / 35 [email protected] DL 2019 - Introduction, linear models

Page 23: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Learning using maximum likelihood

Assume that each data points can be described by a linear model + noise

yi = wxi + b+ εi, εi ∼ N (0, σ2)

Maximum likelihood: Think of ε (dotted) as Gaussain randomvariables, and choose the model (solid) such that the resulting ε areas likely as possible.

23 / 35 [email protected] DL 2019 - Introduction, linear models

Page 24: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Gaussian (Normal) distribution

Probability density function (PDF) for the scalar Gaussiandistribution

N (z;µ, σ2) =1√2πσ2

e−(z−µ)2

2σ2

• µ is the mean (expectedvalue of the distribution)

• σ is the standard deviation

• σ2 is the variance0 5 10

0

0.2

0.4

0.6

0.8

1

zP

DF

µ = −1, σ2 = 0.5

µ = 7, σ2 = 0.2

µ = 4, σ2 = 4

z ∼ N (z;µ, σ2) means that z is a Gaussian random variable withmean µ and variance σ2. ∼ reads “distributed according to”.

24 / 35 [email protected] DL 2019 - Introduction, linear models

Page 25: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Maximum likelihood

A linear model with Gaussian noise

yi = wxi + b+ εi, εi ∼ N (0, σ2), i = 1, . . . , n

The model can also be expressed as

p(yi|xi, w, b) = N (yi;wxi + b, σ2)

Pick w and b which makes the data as likely as possible

w, b = argmaxw,b

p(y1, . . . , yn|x1, . . . , xn, w, b)

Assume all εi to be independent

p(y1, . . . , yn|x1, . . . , xn, w, b) =n∏

i=1

p(yi|xi, w, b)

=

n∏i=1

N (yi;wxi + b, σ2)

∝ e−1

2σ2

∑ni=1(yi−wxi−b)2

25 / 35 [email protected] DL 2019 - Introduction, linear models

y and z independent⇒ p(y, z) = p(y)p(z)

Page 26: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Linear regression

The least squares problem

w, b = argminw,b

n∑i=1

(wxi + b− yi)2

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

Linear regression model

Data (xi, yi)

26 / 35 [email protected] DL 2019 - Introduction, linear models

Page 27: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

ex) A classification problem

Beaver Body Temperature DataInput Body temperature xOutput Beaver is outside y = 1, or inside y = 0, of the retreat.

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5

0

0.5

1

1.5

Temperature x

Pr(y=

1|x)

Data

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5

0

0.5

1

1.5

Temperature x

Pr(y=

1|x)

Linear regression model

Data (xi, yi)

Linear regression is not suitable since it is not constrained to [0, 1]

27 / 35 [email protected] DL 2019 - Introduction, linear models

Page 28: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Logistic regression

Consider a binary classification problem y ∈ {0, 1}.

pi = Pr(yi = 1|xi) and thus Pr(yi = 0|xi) = 1− pi

Let the odds is the ration between the two class probabilities

Pr(yi = 1|xi)Pr(yi = 0|xi)

=pi

1− pi∈ (0,∞)

and log odds consequently

lnpi

1− pi∈ (−∞,∞)

Logistic regression Assume a linear model for the log odds

lnpi

1− pi= wxi + b ⇒ pi =

ezi

1 + ezi, zi = wxi + b

28 / 35 [email protected] DL 2019 - Introduction, linear models

Page 29: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Logistic function (aka sigmoid function)

The function f : R 7→ [0, 1] defined as f(z) = ez

1+ez is known asthe logistic function.

−6 −4 −2 0 2 4 60

0.2

0.4

0.6

0.8

1

z

ez

1+ez

29 / 35 [email protected] DL 2019 - Introduction, linear models

Page 30: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Logistic regression using maximum likelihood

Pick w and b which make data as likely as possible

w, b = argmaxw,b

lnPr(y1, . . . , yn|x1, . . . , xn, w, b)

Assume all yi to be independent

lnPr(y1, . . . ,yn|x1, . . . , xn, w, b)

=

n∑i=1

lnPr(yi|xi, w, b)

=

n∑i=1yi=1

lnPr(yi = 1|xi, w, b)︸ ︷︷ ︸=pi

+

n∑i=1yi=0

lnPr(yi = 0|xi, w, b)︸ ︷︷ ︸=1−pi

This leads to the following optimization problem

w, b = argminw,b

−n∑

i=1

yi ln(pi) + (1− yi) ln(1− pi)

30 / 35 [email protected] DL 2019 - Introduction, linear models

ln(·) strictly increasing ⇒argmax

xf(x) = argmax

xln f(x)

Page 31: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

The Beaver data example

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40

0.2

0.4

0.6

0.8

1

Temperature x

Pr(y=

1|x)

Data

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40

0.2

0.4

0.6

0.8

1

y = 0 y = 1

Temperature x

Pr(y=

1|x)

Logistic regression model

Data (xi, yi)

31 / 35 [email protected] DL 2019 - Introduction, linear models

Page 32: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Linear and logistic regressionMultidimensional input

Linear and logistic regression also work multidimensional inputs

xi = [xi1, . . . , xip]T, i = 1, . . . , n

We assign one parameter for each input dimension

w = [w1, . . . , wp]T

Linear regression Logistic regression

-4

-2

0

10

2

4

105 500

Data

0

0.2

10 10

0.4

0.6

0.8

1

5 500

Data

yi = wTxi + b Pr(yi = 1|xi) = ewTxi+b

1+ewTxi+b

32 / 35 [email protected] DL 2019 - Introduction, linear models

Page 33: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Linear and logistic regression

Linear regression Logistic regressionOutput Outputyi ∈ R yi ∈ {0, 1}

Model Model

yi = wTxi + b pi = Pr(yi = 1|xi) =ew

Txi+b

1+ewTxi+b

Loss LossLi = (yi − yi)2 Li = −yi ln(pi)− (1− yi) ln(1− pi)

We find w and b by minimizing sum of the losses

w, b = argminw,b

1

n

n∑i=1

Li

• For linear regression the problem can be solved analytically.• For logistic regression we need to use numerical optimization

(will talk about it in the next lecture).33 / 35 [email protected] DL 2019 - Introduction, linear models

Page 34: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

Hand-in assignment 1AClassifying biopsies using logistic regression

Biopsy data {yi,xi}ni=1 from n = 699 breast tumors

Input

xi1 clump thicknessxi2 uniformity of cell sizexi3 uniformity of cell shape...

...xi9 mitoses

Output yi benign/malignant

Tasks:

• Derive the gradients of the cost function w.r.t. the parameters

• Implement the logistic regression model

• Update the parameters with gradient descent

The code will be extended to a neural network in HA 1B.

34 / 35 [email protected] DL 2019 - Introduction, linear models

Page 35: Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout version due to copyright c Springer Task: Colorize gray-scale photos. Learn a model

A few concepts to summarize lecture 1

Machine Learning: Deals with learning, reasoning and acting based on data.

Deep Learning: A set of machine learning methods that allow modelscomposed of multiple processing layers.

Regression: Learning problem where the output is quantitative.

Classification: Learning problem where the output is qualitative.

Maximum likelihood: Learning objective based on probability theory.

Linear Regression: Linear model for regression problems.

Logistic Regression: Linear model for classification problems.

35 / 35 [email protected] DL 2019 - Introduction, linear models