INF 5860 Machine learning for image classification
Transcript of INF 5860 Machine learning for image classification
![Page 1: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/1.jpg)
IN 5400 Machine learning for image classification
Lecture 11: Recurrent neural networks
March 18 , 2020
Tollef Jahren
![Page 2: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/2.jpg)
IN5400
18.03.2020
Outline
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 2
![Page 3: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/3.jpg)
IN5400
18.03.2020
About today
Page 3
Goals:
• Understand how to build a recurrent neural network
• Understand why long range dependencies is a challenge for recurrent
neural networks
![Page 4: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/4.jpg)
IN5400
18.03.2020
Readings
• Video:
– cs231n: Lecture 10 | Recurrent Neural Networks
• Read:
– The Unreasonable Effectiveness of Recurrent Neural Networks
• Optional:
– Video: cs224d L8: Recurrent Neural Networks and Language Models
– Video: cs224d L9: Machine Translation and Advanced Recurrent LSTMs
and GRUs
Page 4
![Page 5: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/5.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 5
Progress
![Page 6: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/6.jpg)
IN5400
18.03.2020
Overview
• What have we learnt so far?
– Fully connected neural network
– Convolutional neural network
• What worked well with these?
– Fully connected neural network
works well when the input data is
structured
– Convolutional neural network works
well on images
• What does not work well?
– Processing data with unknown length
Page 6
![Page 7: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/7.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 7
Progress
![Page 8: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/8.jpg)
IN5400
18.03.2020
Recurrent Neural Network (RNN)
• Takes a new input
• Manipulate the state
• Reuse weights
• Gives a new output
Page 8
𝑦𝑡
𝑥𝑡
![Page 9: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/9.jpg)
IN5400
18.03.2020
Recurrent Neural Network (RNN)
• Unrolled view
• 𝑥𝑡 : The input vector:
• ℎ𝑡: The hidden state of the RNN
• 𝑦𝑡 : The output vector:
Page 9
𝑦𝑡
𝑥𝑡
![Page 10: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/10.jpg)
IN5400
18.03.2020
Recurrent Neural Network (RNN)
• Recurrence formula:
– Input vector: 𝑥𝑡– Hidden state vector: ℎ𝑡
Page 10
𝑦𝑡
𝑥𝑡
Note: We use the same function and parameters
for every “time” step
![Page 11: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/11.jpg)
IN5400
18.03.2020
(Vanilla) Recurrent Neural Network
• Input vector: 𝑥𝑡• Hidden state vector: ℎ𝑡• Output vector: 𝑦𝑡• Weight matrices: 𝑊ℎℎ, 𝑊𝑥ℎ ,𝑊ℎ𝑦
• Bias vectors: 𝑏ℎ, 𝑏𝑦
General form:
ℎ𝑡 = 𝑓𝑤 ℎ𝑡−1, 𝑥𝑡
Vanilla RNN (simplest form):
ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡 + 𝑏ℎ
𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦
Page 11
𝑦𝑡
𝑥𝑡
![Page 12: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/12.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 12
![Page 13: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/13.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 13
![Page 14: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/14.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 14
![Page 15: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/15.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 15
![Page 16: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/16.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 16
![Page 17: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/17.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 17
![Page 18: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/18.jpg)
IN5400
18.03.2020
RNN: Computational Graph
Page 18
![Page 19: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/19.jpg)
IN5400
18.03.2020
Example: Language model
• Task: Predicting the next character
• Training sequence: “hello”
• Vocabulary:
– [h, e, l, o]
• Encoding: Onehot
Page 19
![Page 20: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/20.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Task: Predicting the next character
• Training sequence: “hello”
• Vocabulary:
– [h, e, l, o]
• Encoding: Onehot
• Model:
ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡
Page 20
![Page 21: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/21.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Task: Predicting the next character
• Training sequence: “hello”
• Vocabulary:
– [h, e, l, o]
• Encoding: Onehot
• Model:
ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡
𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡
Page 21
![Page 22: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/22.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Vocabulary:
– [h, e, l, o]
• At test time, we can sample from the
model to synthesis new text.
Page 22
![Page 23: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/23.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Vocabulary:
– [h, e, l, o]
• At test time, we can sample from the
model to synthesis new text.
Page 23
![Page 24: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/24.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Vocabulary:
– [h, e, l, o]
• At test time, we can sample from the
model to synthesis new text.
Page 24
![Page 25: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/25.jpg)
IN5400
18.03.2020
RNN: Predicting the next character
• Vocabulary:
– [h, e, l, o]
• At test time, we can sample from the
model to synthesis new text.
Page 25
![Page 26: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/26.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 26
Progress
![Page 27: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/27.jpg)
IN5400
18.03.2020
Input-output structure of RNN’s
• One-to-one
• one-to-many
• many-to-one
• Many-to-many
• many-to-many (encoder-decoder)
Page 27
![Page 28: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/28.jpg)
IN5400
18.03.2020
RNN: One-to-one
Normal feed forward neural network
• Task:
– Predict housing prices
Page 28
Input layer
Hidden layer
Output layer
![Page 29: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/29.jpg)
IN5400
18.03.2020
RNN: One-to-many
Page 29
Task:
• Image Captioning
– (Image Sequence of words)
Input layer
Hidden layer
Output layer
![Page 30: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/30.jpg)
IN5400
18.03.2020
RNN: Many-to-one
Page 30
Tasks:
• Sentiment classification
• Video classification
Input layer
Hidden layer
Output layer
![Page 31: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/31.jpg)
IN5400
18.03.2020
RNN: Many-to-many
Page 31
Task:
• Video classification on frame level
• Text analysis (nouns, verbs)
Input layer
Hidden layer
Output layer
![Page 32: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/32.jpg)
IN5400
18.03.2020
RNN: Many-to-many (encoder-
decoder)
Page 32
Task:
• Machine Translation
(English French)
Input layer
Hidden layer
Output layer
z
![Page 33: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/33.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 33
Progress
![Page 34: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/34.jpg)
IN5400
18.03.2020
Training Recurrent Neural Networks
• The challenge in training a recurrent neural network is to preserve long range
dependencies.
• Vanilla recurrent neural network
– ℎ𝑡 = 𝑓 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏
– 𝑓 = tanh() can easily cause vanishing gradients.
– 𝑓 = 𝑟𝑒𝑙𝑢() can easily cause exploding values and/or gradients
• Finite memory available
Page 34
![Page 35: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/35.jpg)
IN5400
18.03.2020
Exploding or vanishing gradients
• tanh() solves the exploding value problem
• tanh() does NOT solve the exploding gradient problem, think of a scalar input
and a scalar hidden state.
ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏
𝜕ℎ𝑡𝜕ℎ𝑡−1
= 1 − tanh2(𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏) ⋅ 𝑊ℎℎ
The gradient can explode/vanish exponentially in time (steps)
• If |𝑊ℎℎ| < 1, vanishing gradients
• If |𝑊ℎℎ| > 1, exploding gradients
Page 35
![Page 36: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/36.jpg)
IN5400
18.03.2020
Exploding or vanishing gradients
(for reference only)
• Conditions for vanishing and exploding gradients for the weight matrix, 𝑊ℎℎ:
– If largest singular value > 1: Exploding gradients
– If largest singular value < 1: Vanishing gradients
Page 36
“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013
![Page 37: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/37.jpg)
IN5400
18.03.2020
Gradient clipping
• To avoid exploding gradients, a simple
trick is to set a threshold of the norm of
the gradient matrix, 𝜕ℎ𝑡
𝜕ℎ𝑡−1.
• Error surface of a single hidden unit RNN
• High curvature walls
• Solid lines: Standard gradient descent
trajectories
• Dashed lines gradient rescaled to fixes
size
Page 37
“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013
![Page 38: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/38.jpg)
IN5400
18.03.2020
Vanishing gradients - “less of a
problem”
• In contrast to feed-forward networks, RNNs will not stop learning despite of
vanishing gradients.
• The network gets “fresh” inputs each step, so the weights will be updated.
• The challenge is to learning long range dependencies. This can be improved
using more advanced architectures.
Page 38
![Page 39: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/39.jpg)
IN5400
18.03.2020
Backpropagation through time (BPTT)
• Training of a recurrent neural network is done using backpropagation and
gradient descent algorithms
– To calculate gradients you have to keep the internal RNN values in
memory until you get the backpropagating gradient
– Then what if you are reading a book of 100 million characters?
Page 41
![Page 40: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/40.jpg)
IN5400
18.03.2020
Truncated Backpropagation thought
time
• The solution is that you stop at some point and update the weight as if you
were done.
Page 42
![Page 41: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/41.jpg)
IN5400
18.03.2020
Truncated Backpropagation thought
time
• The solution is that you stop at some point and update the weight as if you
were done.
Page 43
![Page 42: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/42.jpg)
IN5400
18.03.2020
Truncated Backpropagation thought
time
• The solution is that you stop at some point and update the weight as if you
were done.
Page 44
![Page 43: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/43.jpg)
IN5400
18.03.2020
Truncated Backpropagation thought
time
• Advantages:
– Reducing the memory requirement
– Faster parameter updates
• Disadvantage:
– Not able to capture longer dependencies then the truncated length.
Page 45
![Page 44: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/44.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 48
Progress
![Page 45: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/45.jpg)
IN5400
18.03.2020
Watch cs231n
• Watch from 31:35 to 39:00
• cs231n: Lecture 10 | Recurrent Neural Networks
Page 49
![Page 46: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/46.jpg)
IN5400
18.03.2020
Inputting Shakespeare
• Input a lot of text
• Try to predict the next character
• Learning spelling, quotes and
punctuation.
Page 50
https://gist.github.com/karpathy/d4dee566867f8291f086
![Page 47: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/47.jpg)
IN5400
18.03.2020
Inputting Shakespeare
Page 51
https://gist.github.com/karpathy/d4dee566867f8291f086
![Page 48: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/48.jpg)
IN5400
18.03.2020
Inputting latex
Page 52
![Page 49: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/49.jpg)
IN5400
18.03.2020
Inputting C code (linux kernel)
Page 53
![Page 50: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/50.jpg)
IN5400
18.03.2020
Interpreting cell activations
• Semantic meaning of the hidden state vector
Page 54
Visualizing and Understanding Recurrent Networks
![Page 51: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/51.jpg)
IN5400
18.03.2020
Interpreting cell activations
Page 55
Visualizing and Understanding Recurrent Networks
![Page 52: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/52.jpg)
IN5400
18.03.2020
Interpreting cell activations
Page 56
Visualizing and Understanding Recurrent Networks
![Page 53: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/53.jpg)
IN5400
18.03.2020
Interpreting cell activations
Page 57
Visualizing and Understanding Recurrent Networks
![Page 54: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/54.jpg)
IN5400
18.03.2020
Interpreting cell activations
Page 58
Visualizing and Understanding Recurrent Networks
![Page 55: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/55.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 59
Progress
![Page 56: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/56.jpg)
IN5400
18.03.2020
Gated Recurrent Unit (GRU)
• A more advanced recurrent unit
• Uses gates to control information flow
• Used to improve long term dependencies
Page 60
![Page 57: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/57.jpg)
IN5400
18.03.2020
Gated Recurrent Unit (GRU)
Page 61
Vanilla RNN
• Cell ℎ𝑡 = 𝑡𝑎𝑛ℎ(𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1 + 𝑏)
GRU
• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)
• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)
• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)
• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡
The GRU has the ability to add and to remove from the state, not “transforming
the state” only.
With Γ𝑟 as ones and Γ𝑢 as zeros, GRU ՜ Vanilla RNN
![Page 58: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/58.jpg)
IN5400
18.03.2020
Gated Recurrent Unit (GRU)
• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)
• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)
• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)
• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡
Page 62
• The update gate , Γ𝑢, can easily be close to 1 due to the sigmoid activation
function. Copying the previous hidden state will prevent vanishing gradients.
• With Γ𝑢 = 0 and Γ𝑟 = 0 the GRU unit can forget the past.
• In a standard RNN, we are transforming the hidden state regardless of how
useful the input is.
![Page 59: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/59.jpg)
IN5400
18.03.2020
Long Short Term Memory (LSTM)
(for reference only)
Page 63
GRU LSTM
Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢) Forget gate Γ𝑓 = 𝜎(𝑊𝑓𝑥𝑡 + 𝑈𝑓 ℎ𝑡−1 + 𝑏𝑓)
Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟) Input gate Γ𝑖 = 𝜎(𝑊𝑖𝑥𝑡 + 𝑈𝑖 ℎ𝑡−1 + 𝑏𝑖)
Output gate Γ𝑜 = 𝜎(𝑊𝑜𝑥𝑡 + 𝑈𝑜ℎ𝑡−1 + 𝑏𝑜)
Candidate cell
state
෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏) Potential cell memory ǁ𝑐𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐𝑥𝑡 + 𝑈𝑐 ℎ𝑡−1 + 𝑏𝑐)
Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡 Final cell memory 𝑐𝑡 = Γ𝑓 ∘ 𝑐𝑡−1 + Γ𝑖 ∘ ǁ𝑐𝑡
Final cell state ℎ𝑡 = Γ𝑜 ∘ tanh(𝑐𝑡)
LSTM training
• One can initialize the biases of the forget
gate such that the values in the forget gate
are close to one.
![Page 60: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/60.jpg)
IN5400
18.03.2020
Multi-layer Recurrent Neural Networks
• Multi-layer RNNs can be used to enhance
model complexity
• Similar as for feed forward neural networks,
stacking layers creates higher level feature
representation
• Normally, 2 or 3 layers deep, not as deep
as conv nets
• More complex relationships in time
Page 64
![Page 61: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/61.jpg)
IN5400
18.03.2020
Bidirectional recurrent neural network
• The blocks can be vanilla, LSTM and GRU recurrent units
• Real time vs post processing
Page 65
ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1
ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1
ℎ𝑡 = ℎ𝑡, ℎ𝑡
𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)
![Page 62: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/62.jpg)
IN5400
18.03.2020
Bidirectional recurrent neural network
• Example:
– Task: Is the word part of a person’s name?
– Text 1: He said, “Teddy bear are on sale!”
– Text 2: He said, “Teddy Roosevelt was a great President!”
Page 66
ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1
ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1
ℎ𝑡 = ℎ𝑡, ℎ𝑡
𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)
![Page 63: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/63.jpg)
IN5400
18.03.2020
• Overview (Feed forward and convolution neural networks)
• Vanilla Recurrent Neural Network (RNN)
• Input-output structure of RNN’s
• Training Recurrent Neural Networks
• Simple examples
• Advanced models
• Advanced examples
Page 67
Progress
![Page 64: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/64.jpg)
IN5400
18.03.2020
Image Captioning
• Combining a CNN with an RNN to generate descriptive text about an image.
• Learning an RNN to interpret top level features from a CNN
Page 68
Deep Visual-Semantic Alignments for Generating Image Descriptions
![Page 65: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/65.jpg)
IN5400
18.03.2020
Image Captioning
Page 69
![Page 66: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/66.jpg)
IN5400
18.03.2020
Image Captioning
Page 70
![Page 67: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/67.jpg)
IN5400
18.03.2020
Image Captioning
Page 71
![Page 68: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/68.jpg)
IN5400
18.03.2020
Image Captioning
Page 72
![Page 69: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/69.jpg)
IN5400
18.03.2020
Image Captioning
Page 73
![Page 70: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/70.jpg)
IN5400
18.03.2020
Image Captioning
Page 74
![Page 71: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/71.jpg)
IN5400
18.03.2020
Image Captioning: Results
Page 75
NeuralTalk2
![Page 72: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/72.jpg)
IN5400
18.03.2020
Image Captioning: Failures
Page 76
NeuralTalk2
![Page 73: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/73.jpg)
IN5400
18.03.2020
CNN + RNN
• With RNN on top of CNN features,
you capture a larger time horizon
Page 80
Long-term Recurrent Convolutional Networks for Visual Recognition
and Description
![Page 74: INF 5860 Machine learning for image classification](https://reader030.fdocuments.us/reader030/viewer/2022012502/617c1a97a6db8252744e93ab/html5/thumbnails/74.jpg)
IN5400
18.03.2020
Alternatives to RNN
(for reference only)
• Temporal convolutional networks (TCN)
– An Empirical Evaluation of Generic Convolutional and Recurrent Networks
for Sequence Modeling
• Attention based models
– Attention Is All You Need
Page 82