Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen...

Post on 31-May-2020

14 views 0 download

Transcript of Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen...

Deep Learning

Andreas Geiger

Autonomous Vision GroupMPI Tubingen

Computer Vision and Geometry LabETH Zurich

January 13, 2017

Deep Learning

2

Deep Learning

“Deep learning is just a buzzword for neural nets, and neural nets arejust a stack of matrix-vector multiplications, interleaved with somenon-linearities. No magic there.” Ronan Collobert, 2011

3

“I sometimes get questions like: how does deep learning comparewith graphical models? There is no answer to this question becausedeep learning and graphical models are orthogonal concepts that canbe advantageously combined.” Yann LeCun, 2013

4

Representation Matters

CHAPTER 1. INTRO

Cartesian Coordinates

x

y

TION

Polar Coordinates

r

θ

5

Classification

Input

"Beach"

Model Output

� fθ : x ∈ RW×H 7→ y ∈ {1, . . . , L}� fθ : x ∈ RW×H 7→ y = [0, . . . , 1, . . . , 0] ∈ {0, 1}L

6

Linear Regression

� Mapping:x = Image

y = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

7

Linear Regression

� Mapping:x = Imagey = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

7

Linear Regression

� Mapping:x = Imagey = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = Bh + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = B(Ax + a) + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = BAx + Ba + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = BA︸︷︷︸

=C

x + Ba + b︸ ︷︷ ︸=c

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = Cx + c

xN

x1

1

yL

y1

C1L

C11

c1cL

CNL

CN1

7

Logistic Regression

� Mapping:x = Image

y = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

8

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

8

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

−10 −5 0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

8

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

8

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

8

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

1

8

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

8

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:y = σ(B(σ(Ax + a)) + b)

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

8

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobsyest

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobsyestE =

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Convolutional Neural Networks

� Convolution filters with shared parameters

� Subsampling / pooling / unpooling

Try yourself: www.cvlibs.net/learn

Y. LeCun, L. Bottou, Y. Bengio and Patrick Haffner: Gradient-based learning applied todocument recognition. Proceedings of the IEEE, 1989, Vol. 86, no. 11, pp. 2278–2324.

10

Convolutional Neural Networks

11

Depth Matters

3.57

6.7 7.3

11.7

16.4

25.8

28.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNet Classification top-5 error (%)

shallow8 layers

19 layers22 layers

152 layers

8 layers

K. He, X. Zhang, S. Ren, and J. Sun: Deep Residual Learning for Image Recognition.CVPR, 2016. Best Paper Award.

12

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Image Captioning

O. Vinyals, A. Toshev, S. Bengio and D. Erhan: Show and Tell: A Neural Image CaptionGenerator. CVPR, 2015.

14

Graphical Models vs. Deep Learning

Graphical Models

� Probabilistic

� Dependencies btw. RVs

� Low capacity

� Domain knowledge: easy

Deep Neural Networks

� Deterministic

� Input/Output Mapping

� High capacity

� Domain knowledge: hard

Combinations:D. Kingma and M. Welling: Auto-encoding variational Bayes. ICLR, 2014.

L. Chen, A. Schwing and R. Urtasun: Learning Deep Structured Models. ICML, 2015.

J. Domke: Learning graphical model parameters with approximate marginal inference.PAMI, 2013, Vol. 35, no. 10, pp. 2454–2467.

15