ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504:...

25
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial

Transcript of ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504:...

Page 1: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

ECE 6504: Deep Learning for Perception

Dhruv Batra Virginia Tech

Topics: –  Recurrent Neural Networks (RNNs) –  BackProp Through Time (BPTT) –  Vanishing / Exploding Gradients –  [Abhishek:] Lua / Torch Tutorial

Page 2: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Administrativia •  HW3

–  Out today –  Due in 2 weeks –  Please please please please please start early –  https://computing.ece.vt.edu/~f15ece6504/homework3/

(C) Dhruv Batra 2

Page 3: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Plan for Today •  Model

–  Recurrent Neural Networks (RNNs)

•  Learning –  BackProp Through Time (BPTT) –  Vanishing / Exploding Gradients

•  [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 3

Page 4: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

New Topic: RNNs

(C) Dhruv Batra 4 Image Credit: Andrej Karpathy

Page 5: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Synonyms •  Recurrent Neural Networks (RNNs)

•  Recursive Neural Networks –  General familty; think graphs instead of chains

•  Types: –  Long Short Term Memory (LSTMs) –  Gated Recurrent Units (GRUs) –  Hopfield network –  Elman networks –  …

•  Algorithms –  BackProp Through Time (BPTT) –  BackProp Through Structure (BPTS)

(C) Dhruv Batra 5

Page 6: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

What’s wrong with MLPs? •  Problem 1: Can’t model sequences

–  Fixed-sized Inputs & Outputs –  No temporal structure

•  Problem 2: Pure feed-forward processing –  No “memory”, no feedback

(C) Dhruv Batra 6 Image Credit: Alex Graves, book

Page 7: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Sequences are everywhere…

(C) Dhruv Batra 7 Image Credit: Alex Graves and Kevin Gimpel

Page 8: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

(C) Dhruv Batra 8

Even where you might not expect a sequence…

Image Credit: Vinyals et al.

Page 9: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Even where you might not expect a sequence…

9 Image Credit: Ba et al.; Gregor et al

•  Input ordering = sequence

(C) Dhruv Batra

Page 10: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

(C) Dhruv Batra 10 Image Credit: [Pinheiro and Collobert, ICML14]

Page 11: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Why model sequences?

Figure Credit: Carlos Guestrin

Page 12: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Why model sequences?

(C) Dhruv Batra 12 Image Credit: Alex Graves

Page 13: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Name that model

Hidden Markov Model (HMM)

Y1 = {a,…z}

X1 =

Y5 = {a,…z} Y3 = {a,…z} Y4 = {a,…z} Y2 = {a,…z}

X2 = X3 = X4 = X5 =

Figure Credit: Carlos Guestrin (C) Dhruv Batra 13

Page 14: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

How do we model sequences? •  No input

(C) Dhruv Batra 14 Image Credit: Bengio, Goodfellow, Courville

Page 15: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

How do we model sequences? •  With inputs

(C) Dhruv Batra 15 Image Credit: Bengio, Goodfellow, Courville

Page 16: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

How do we model sequences? •  With inputs and outputs

(C) Dhruv Batra 16 Image Credit: Bengio, Goodfellow, Courville

Page 17: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

How do we model sequences? •  With Neural Nets

(C) Dhruv Batra 17 Image Credit: Alex Graves

Page 18: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

How do we model sequences? •  It’s a spectrum…

(C) Dhruv Batra 18

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Input: Sequence

Output: Sequence

Example: machine translation, video captioning, open-ended question answering, video question answering

Image Credit: Andrej Karpathy

Page 19: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Things can get arbitrarily complex

(C) Dhruv Batra 19 Image Credit: Herbert Jaeger

Page 20: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Key Ideas •  Parameter Sharing + Unrolling

–  Keeps numbers of parameters in check –  Allows arbitrary sequence lengths!

•  “Depth” –  Measured in the usual sense of layers –  Not unrolled timesteps

•  Learning –  Is tricky even for “shallow” models due to unrolling

(C) Dhruv Batra 20

Page 21: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Plan for Today •  Model

–  Recurrent Neural Networks (RNNs)

•  Learning –  BackProp Through Time (BPTT) –  Vanishing / Exploding Gradients

•  [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 21

Page 22: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

BPTT •  a

(C) Dhruv Batra 22 Image Credit: Richard Socher

Page 23: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Illustration [Pascanu et al] •  Intuition

•  Error surface of a single hidden unit RNN; High curvature walls •  Solid lines: standard gradient descent trajectories •  Dashed lines: gradient rescaled to fix problem

(C) Dhruv Batra 23

Page 24: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Fix #1 •  Pseudocode

(C) Dhruv Batra 24 Image Credit: Richard Socher

Page 25: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L14_RNNs... · 2015-10-15 · ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: – Recurrent Neural

Fix #2 •  Smart Initialization and ReLus

–  [Socher et al 2013] –  A Simple Way to Initialize Recurrent Networks of Rectified

Linear Units, Le et al. 2015

(C) Dhruv Batra 25