PyConZA'17 Deep Learning for Computer Vision

106
Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor Confidential Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy PyConZA’17

Transcript of PyConZA'17 Deep Learning for Computer Vision

Page 1: PyConZA'17 Deep Learning for Computer Vision

Deep Learning for Computer VisionExecutive-ML 2017/09/21

Neither Proprietary nor Confidential – Please Distribute ;)

Alex Conway

alex @ numberboost.com

@alxcnwy

PyConZA’17

Page 2: PyConZA'17 Deep Learning for Computer Vision

Hands up!

Page 3: PyConZA'17 Deep Learning for Computer Vision

Check out theDeep Learning Indaba

videos & practicals!

http://www.deeplearningindaba.com/videos.html

http://www.deeplearningindaba.com/practicals.html

Page 4: PyConZA'17 Deep Learning for Computer Vision

Deep Learning is Sexy (for a reason!)

4

Page 5: PyConZA'17 Deep Learning for Computer Vision

Image Classification

5

http://yann.lecun.com/exdb/mnist/

https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py

(99.25% test accuracy in 192 seconds and 70 lines of code)

Page 6: PyConZA'17 Deep Learning for Computer Vision

Image Classification

6

Page 7: PyConZA'17 Deep Learning for Computer Vision

Image Classification

7

ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et. Al.

Advances in Neural Information Processing Systems 25 (NIPS2012)

Page 8: PyConZA'17 Deep Learning for Computer Vision

https://research.googleblog.com/2017/06/supercharg

e-your-computer-vision-models.html

Object detection

Page 9: PyConZA'17 Deep Learning for Computer Vision

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Page 10: PyConZA'17 Deep Learning for Computer Vision

Object detection

Page 11: PyConZA'17 Deep Learning for Computer Vision

Image Captioning & Visual Attention

XXX

11

https://einstein.ai/research/knowing-when-to-look-adaptive-

attention-via-a-visual-sentinel-for-image-captioning

Page 12: PyConZA'17 Deep Learning for Computer Vision

Image Q&A

12https://arxiv.org/pdf/1612.00837.pdf

Page 13: PyConZA'17 Deep Learning for Computer Vision

Video Q&A

XXX

13

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 14: PyConZA'17 Deep Learning for Computer Vision

Pix2Pix

https://affinelayer.com/pix2pix/

https://github.com/affinelayer/pix2pix-tensorflow14

Page 15: PyConZA'17 Deep Learning for Computer Vision

Pix2Pix

https://medium.com/towards-data-science/face2face-a-pix2pix-demo-

that-mimics-the-facial-expression-of-the-german-chancellor-

b6771d65bf66 15

Page 16: PyConZA'17 Deep Learning for Computer Vision

16

Original inputRear Window (1954)

Pix2pix outputFully Automated

RemasteredPainstakingly by Hand

https://hackernoon.co

m/remastering-classic-

films-in-tensorflow-

with-pix2pix-

f4d551fa0503

Page 17: PyConZA'17 Deep Learning for Computer Vision

Style Transfer

https://github.com/junyanz/CycleGAN17

Page 18: PyConZA'17 Deep Learning for Computer Vision

Style Transfer SORCERY

https://github.com/junyanz/CycleGAN18

Page 19: PyConZA'17 Deep Learning for Computer Vision

Real Fake News

https://www.youtube.com/watch?v=MVBe6_o4cMI

19

Page 20: PyConZA'17 Deep Learning for Computer Vision

Deep learning is MagicDeep learning is MagicDeep learning is EASY!

Page 21: PyConZA'17 Deep Learning for Computer Vision

1. What is a neural network?

2. What is a convolutional neural network?

3. How to use a convolutional neural

network

4. More advanced Methods

5. Case studies & applications 21

Page 22: PyConZA'17 Deep Learning for Computer Vision

1.What is a neural

network?

Page 23: PyConZA'17 Deep Learning for Computer Vision

What is a neuron?

24

• 3 inputs [x1,x2,x3]

• 3 weights [w1,w2,w3]

• Element-wise multiply and sum

• Apply activation function f

• Often add a bias too (weight of 1) – not

shown

Page 24: PyConZA'17 Deep Learning for Computer Vision

What is an Activation Function?

25

Sigmoid Tanh ReLU

Nonlinearities … “squashing functions” … transform neuron’s output

NB: sigmoid output in [0,1]

Page 25: PyConZA'17 Deep Learning for Computer Vision

What is a (Deep) Neural Network?

26

Inputs outputs

hidden

layer 1hidden

layer 2

hidden

layer 3

Outputs of one layer are inputs into the next layer

Page 26: PyConZA'17 Deep Learning for Computer Vision

How does a neural network learn?

27

• We need labelled examples “training data”

• We initialize network weights randomly and initially get random predictions

• For each labelled training data point, we calculate the error between the

network’s predictions and the ground-truth labels

• Use ‘backpropagation’ (chain rule), to update the network parameters

(weights + convolutional filters ) in the opposite direction to the error

Page 27: PyConZA'17 Deep Learning for Computer Vision

How does a neural network learn?

28

New

weight =Old

weight

Learning

rate-Gradient of

weight with

respect to Error( )x“How much

error increases

when we increase

this weight”

Page 28: PyConZA'17 Deep Learning for Computer Vision

Gradient Descent Interpretation

29

http://scs.ryerson.ca/~aharley/neural-networks/

Page 29: PyConZA'17 Deep Learning for Computer Vision

http://playground.tensorflow.org

Page 30: PyConZA'17 Deep Learning for Computer Vision

What is a Neural Network?

For much more detail, see:

1. Michael Nielson’s Neural Networks &

Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html

2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html

31

Page 31: PyConZA'17 Deep Learning for Computer Vision

2. What is a

convolutional

neural network?

Page 32: PyConZA'17 Deep Learning for Computer Vision

What is a Convolutional Neural

Network?

33

“like a ordinary neural network but with

special types of layers that work well

on images”

(math works on numbers)

• Pixel = 3 colour channels (R, G, B)

• Pixel intensity ∈[0,255]

• Image has width w and height h

• Therefore image is w x h x 3 numbers

Page 33: PyConZA'17 Deep Learning for Computer Vision

34

This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

Page 34: PyConZA'17 Deep Learning for Computer Vision

35

This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

Page 35: PyConZA'17 Deep Learning for Computer Vision

Convolutions

36

http://setosa.io/ev/image-kernels/

Page 36: PyConZA'17 Deep Learning for Computer Vision

Convolutions

37http://deeplearning.net/software/theano/tutorial/conv_arithmetic.ht

ml

Page 37: PyConZA'17 Deep Learning for Computer Vision

New Layer Type: Convolutional Layer

38

• 2-d weighted average when multiply kernel over pixel patches

• We slide the kernel over all pixels of the image (handle

borders)

• Kernel starts off with “random” values and network updates

(learns) the kernel values (using backpropagation) to try

minimize loss

• Kernels shared across the whole image (parameter

sharing)

Page 38: PyConZA'17 Deep Learning for Computer Vision

Many Kernels = Many “Activation Maps” = Volume

39http://cs231n.github.io/convolutional-networks/

Page 39: PyConZA'17 Deep Learning for Computer Vision

New Layer Type: Convolutional Layer

40

Page 40: PyConZA'17 Deep Learning for Computer Vision

Convolutions

41

https://github.com/fchollet/keras/blob/master/examples/conv_filter_visuali

zation.py

Page 41: PyConZA'17 Deep Learning for Computer Vision

Convolutions

42

Page 42: PyConZA'17 Deep Learning for Computer Vision

Convolutions

43

Page 43: PyConZA'17 Deep Learning for Computer Vision

Convolutions

44

Page 44: PyConZA'17 Deep Learning for Computer Vision

Convolution Learn Hierarchical Features

45

Page 45: PyConZA'17 Deep Learning for Computer Vision

New Layer Type: Max Pooling

47

Page 46: PyConZA'17 Deep Learning for Computer Vision

New Layer Type: Max Pooling

• Reduces dimensionality from one layer to next

• …by replacing NxN sub-area with max value

• Makes network “look” at larger areas of the image at a time

• e.g. Instead of identifying fur, identify cat

• Reduces overfitting since losing information helps the network

generalize

48http://cs231n.github.io/convolutional-networks/

Page 47: PyConZA'17 Deep Learning for Computer Vision

New Layer Type: Max Pooling

49

Page 48: PyConZA'17 Deep Learning for Computer Vision

Stack Conv + Pooling Layers and Go Deep

50

Convolution + max pooling + fully connected +

softmax

Page 49: PyConZA'17 Deep Learning for Computer Vision

51

Stack Conv + Pooling Layers and Go Deep

Convolution + max pooling + fully connected +

softmax

Page 50: PyConZA'17 Deep Learning for Computer Vision

52

Stack These Layers and Go Deep

Convolution + max pooling + fully connected +

softmax

Page 51: PyConZA'17 Deep Learning for Computer Vision

53

Stack These Layers and Go Deep

Convolution + max pooling + fully connected +

softmax

Page 52: PyConZA'17 Deep Learning for Computer Vision

54

Flatten the Final “Bottleneck” layer

Convolution + max pooling + fully connected +

softmax

Flatten the final 7 x 7 x 512 max pooling layer

Add fully-connected dense layer on top

Page 53: PyConZA'17 Deep Learning for Computer Vision

55

Bringing it all together

Convolution + max pooling + fully connected +

softmax

Page 54: PyConZA'17 Deep Learning for Computer Vision

Softmax

Convert scores ∈ ℝ to probabilities ∈ [0,1]

Final output prediction = highest probability class

56

Page 55: PyConZA'17 Deep Learning for Computer Vision

Bringing it all together

57

Convolution + max pooling + fully connected +

softmax

Page 56: PyConZA'17 Deep Learning for Computer Vision

We need labelled training data!

Page 57: PyConZA'17 Deep Learning for Computer Vision

ImageNet

59http://image-net.org/explore

1000 object categories

1.2 million training images

Page 58: PyConZA'17 Deep Learning for Computer Vision

ImageNet

60

Page 59: PyConZA'17 Deep Learning for Computer Vision

ImageNet

61

Page 60: PyConZA'17 Deep Learning for Computer Vision

ImageNet Top 5 Error Rate

62

Traditional

Image Processing

Methods

AlexNet

8 Layers

ZFNet

8 Layers

GoogLeNe

t

22 LayersResNet

152 Layers SENet

Ensamble

TSNet

Ensamble

Page 61: PyConZA'17 Deep Learning for Computer Vision

3. How to use a

convolutional neural

network

Page 62: PyConZA'17 Deep Learning for Computer Vision

Using a Pre-Trained ImageNet-Winning

CNN

64

• We’ve been looking at “VGGNet”

• Oxford Visual Geometry Group (VGG)

• ImageNet 2014 Runner-up

• Network is 16 layers (deep!)

• Easy to fine-tune

https://blog.keras.io/building-powerful-image-classification-

models-using-very-little-data.html

Page 63: PyConZA'17 Deep Learning for Computer Vision

Example: Classifying Product Images

65

https://github.com/alexcnwy/CTDL_CNN_TALK_20

170620

Classifying

products

into 9

categories

Page 64: PyConZA'17 Deep Learning for Computer Vision

66

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-

data.html

Start with Pre-Trained ImageNet

Model

Consumers vs Producers of Machine Learning

Page 65: PyConZA'17 Deep Learning for Computer Vision

67

Page 66: PyConZA'17 Deep Learning for Computer Vision

“Transfer Learning”

is a game changer

Page 67: PyConZA'17 Deep Learning for Computer Vision

Fine-tuning A CNN To Solve A New Problem

• Cut off last layer of pre-trained Imagenet winning

CNN

• Keep learned network (convolutions) but replace

final layer

• Can learn to predict new (completely different)

classes

• Fine-tuning is re-training new final layer - learn for

new task

69

Page 68: PyConZA'17 Deep Learning for Computer Vision

Fine-tuning A CNN To Solve A New Problem

70

Page 69: PyConZA'17 Deep Learning for Computer Vision

71

Before Fine-Tuning

Page 70: PyConZA'17 Deep Learning for Computer Vision

72

After Fine-Tuning

Page 71: PyConZA'17 Deep Learning for Computer Vision

Fine-tuning A CNN To Solve A New Problem

• Fix weights in convolutional layers (set

trainable=False)

• Remove final dense layer that predicts 1000

ImageNet classes

• Replace with new dense layer to predict 9

categories

73

88% accuracy in under 2 minutes for

classifying products into categories

Fine-tuning is awesome!

Insert obligatory brain analogy

Page 72: PyConZA'17 Deep Learning for Computer Vision
Page 73: PyConZA'17 Deep Learning for Computer Vision

Visual Similarity

75

• Chop off last 2 VGG layers

• Use dense layer with 4096 activations

• Compute nearest neighbours in the space of these

activations

https://memeburn.com/2017/06/spree-image-search/

Page 74: PyConZA'17 Deep Learning for Computer Vision

76

https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Page 75: PyConZA'17 Deep Learning for Computer Vision

77

Input Imagenot seen by

model

ResultsTop 10 most

“visually similar”

Page 76: PyConZA'17 Deep Learning for Computer Vision

78

Final Convolutional Layer = Semantic

Vector

• The final convolutional

layer encodes

everything the network

needs to make

predictions

• The dense layer added

on top and the softmax

layer both have lower

dimensionality

Page 77: PyConZA'17 Deep Learning for Computer Vision

4. More Advanced

Methods

Page 78: PyConZA'17 Deep Learning for Computer Vision

Use a Better Architecture (or all of

them!)

80

“Ensambles win”

learn a weighted average of many models’ predictions

Page 79: PyConZA'17 Deep Learning for Computer Vision

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

There are MANY Computer Vision Tasks

Page 80: PyConZA'17 Deep Learning for Computer Vision

Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015

Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016

Semantic Segmentation

Page 81: PyConZA'17 Deep Learning for Computer Vision

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Object detection

Page 82: PyConZA'17 Deep Learning for Computer Vision

http://blog.romanofoti.com/style_transfer/

Johnson et al. Perceptual losses for real-time style transfer and super-resolution. 2016

Style Transfer

f ( ) = ,

Page 83: PyConZA'17 Deep Learning for Computer Vision

https://www.youtube.com/watch?v=LhF_56SxrGk

Page 84: PyConZA'17 Deep Learning for Computer Vision

Pixelated OriginalOutput

https://arstec

hnica.com/inf

ormation-

technology/2

017/02/googl

e-brain-

super-

resolution-

zoom-

enhance/

Page 85: PyConZA'17 Deep Learning for Computer Vision

This image is

3.8 kb

Super-Resolution

Page 86: PyConZA'17 Deep Learning for Computer Vision

https://github.com/tdeboissiere/BGG16CAM-keras

Visual Attention

Page 87: PyConZA'17 Deep Learning for Computer Vision

Image Captioning

https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-

captioning

Karpathy & Li. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), pp. 664–676

Page 88: PyConZA'17 Deep Learning for Computer Vision

Image Q&A

XXX

92

http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

Page 89: PyConZA'17 Deep Learning for Computer Vision

Video Q&A

XXX

93

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 90: PyConZA'17 Deep Learning for Computer Vision

Video Q&A

XXX

94

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 91: PyConZA'17 Deep Learning for Computer Vision

king + woman – man ≈ queen

95

Frome et al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’,

Advances in Neural Information Processing Systems, pp. 2121–2129

CNN + Word2Vec = AWESOME

Page 92: PyConZA'17 Deep Learning for Computer Vision

DeViSE: A Deep Visual-Semantic Embedding Model

XXX

96

Before:

Encode labels as 1-hot vector

Page 93: PyConZA'17 Deep Learning for Computer Vision

DeViSE: A Deep Visual-Semantic Embedding Model

XXX

97

After:

Encode labels as word2vec vectors

(FROM A SEPARATE MODEL)

Can look these up for all the nouns in ImageNet

300-d

word2vec

vectors

Page 94: PyConZA'17 Deep Learning for Computer Vision

DeViSE: A Deep Visual-Semantic Embedding Model

wv(fish)+ wv(net)

98

https://www.youtube.com/watch?v=uv0gmrXSXVg

2wv* =

…get nearest neighbours to wv*

Page 95: PyConZA'17 Deep Learning for Computer Vision

5. Case Studies

Page 96: PyConZA'17 Deep Learning for Computer Vision

Estimating Accident Repair Cost from

Photos

TODO

100

Prototype for

large SA

insurer

Detect car

make &

model from

registration

disk

Predict repair

cost using

learned

model

Page 97: PyConZA'17 Deep Learning for Computer Vision

Image & Video Moderation

TODO

101

Large international gay dating app with tens of

millions of users

uploading hundreds-of-thousands of photos per day

Page 98: PyConZA'17 Deep Learning for Computer Vision

Segmenting Medical Images

TODO

102

Page 99: PyConZA'17 Deep Learning for Computer Vision

m

Counting People

TODO

Count shoppers, segment on age & gender

facial recognition loyalty is next

Page 100: PyConZA'17 Deep Learning for Computer Vision

Counting Cars

TODO

Page 101: PyConZA'17 Deep Learning for Computer Vision

Detecting Potholes

Page 102: PyConZA'17 Deep Learning for Computer Vision

Extracting Data from Product

Catalogues

Page 103: PyConZA'17 Deep Learning for Computer Vision

Real-time ATM Video Classification

107

Page 104: PyConZA'17 Deep Learning for Computer Vision

Optical Sorting

TODO

108

https://www.youtube.com/watch?v=Xf7jaxwnyso

Page 105: PyConZA'17 Deep Learning for Computer Vision

We are hiring!

alex @ numberboost.com

Page 106: PyConZA'17 Deep Learning for Computer Vision

GET IN TOUCH!

Alex Conway

alex @ numberboost.com

@alxcnwy