Deep Learning for Computer Vision - ExecutiveML

94
Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor Confidential – Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy

Transcript of Deep Learning for Computer Vision - ExecutiveML

Page 1: Deep Learning for Computer Vision - ExecutiveML

Deep Learning for Computer VisionExecutive-ML 2017/09/21

Neither Proprietary nor Confidential – Please Distribute ;)

Alex Conway

alex @ numberboost.com@alxcnwy

Page 2: Deep Learning for Computer Vision - ExecutiveML

Hands up!

Page 3: Deep Learning for Computer Vision - ExecutiveML

Check out theDeep Learning Indaba

videos & practicals!

http://www.deeplearningindaba.com/videos.htmlhttp://www.deeplearningindaba.com/practicals.html

Page 4: Deep Learning for Computer Vision - ExecutiveML

Deep Learning is Sexy (for a reason!)

4

Page 5: Deep Learning for Computer Vision - ExecutiveML

Image Classification

5

http://yann.lecun.com/exdb/mnist/

https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py(99.25% test accuracy in 192 seconds and 70 lines of code)

Page 6: Deep Learning for Computer Vision - ExecutiveML

Image Classification

6

Page 7: Deep Learning for Computer Vision - ExecutiveML

Image Classification

7

Page 8: Deep Learning for Computer Vision - ExecutiveML

https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Object detection

Page 9: Deep Learning for Computer Vision - ExecutiveML

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Page 10: Deep Learning for Computer Vision - ExecutiveML

Image Captioning & Visual Attention

XXX

10

https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning

Page 11: Deep Learning for Computer Vision - ExecutiveML

Image Q&A

11https://arxiv.org/pdf/1612.00837.pdf

Page 12: Deep Learning for Computer Vision - ExecutiveML

Video Q&A

XXX

12

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 13: Deep Learning for Computer Vision - ExecutiveML

Pix2Pix

https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow

13

Page 14: Deep Learning for Computer Vision - ExecutiveML

Pix2Pix

https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66

14

Page 15: Deep Learning for Computer Vision - ExecutiveML

15

Original inputRear Window (1954)

Pix2pix outputFully Automated

RemasteredPainstakingly by Hand

https://hackernoon.com/remastering-classic-films-in-tensorflow-with-pix2pix-f4d551fa0503

Page 16: Deep Learning for Computer Vision - ExecutiveML

Style Transfer

https://github.com/junyanz/CycleGAN16

Page 17: Deep Learning for Computer Vision - ExecutiveML

Style Transfer MAGIC

https://github.com/junyanz/CycleGAN17

Page 18: Deep Learning for Computer Vision - ExecutiveML

1. What is a neural network?

2. What is a convolutional neural network?

3. How to use a convolutional neural network

4. More advanced Methods

5. Case studies & applications

18

Page 19: Deep Learning for Computer Vision - ExecutiveML

Big Shout Outs

Jeremy Howard & Rachel Thomashttp://course.fast.ai

Andrej Karpathyhttp://cs231n.github.io

François Chollet (Keras lead dev)https://keras.io/

19

Page 20: Deep Learning for Computer Vision - ExecutiveML

1.What is a neural network?

Page 21: Deep Learning for Computer Vision - ExecutiveML

What is a neuron?

21

• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f• Often add a bias too (weight of 1) – not shown

Page 22: Deep Learning for Computer Vision - ExecutiveML

What is an Activation Function?

22

Sigmoid Tanh ReLU

Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]

Page 23: Deep Learning for Computer Vision - ExecutiveML

What is a (Deep) Neural Network?

23

Inputs outputs

hiddenlayer 1

hiddenlayer 2

hiddenlayer 3

Outputs of one layer are inputs into the next layer

Page 24: Deep Learning for Computer Vision - ExecutiveML

How does a neural network learn?

24

• You need labelled examples “training data”

• Initially, the network makes random predictions (weights initialized randomly)

• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (aka “loss function”)

• Use ‘backpropagation’ (really just the chain rule), to update the network parameters (weights) in the opposite direction to the error

Page 25: Deep Learning for Computer Vision - ExecutiveML

How does a neural network learn?

25

New weight = Old

weightLearning

rate-Gradient of weight with

respect to Error( )x“How much

error increaseswhen we increase

this weight”

Page 26: Deep Learning for Computer Vision - ExecutiveML

Gradient Descent Interpretation

26

http://scs.ryerson.ca/~aharley/neural-networks/

Page 27: Deep Learning for Computer Vision - ExecutiveML

http://playground.tensorflow.org

Page 28: Deep Learning for Computer Vision - ExecutiveML

What is a Neural Network?

For much more detail, see:

1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html

2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html

28

Page 29: Deep Learning for Computer Vision - ExecutiveML

2. What is a convolutional

neural network?

Page 30: Deep Learning for Computer Vision - ExecutiveML

What is a Convolutional Neural Network?

30

“like a ordinary neural network but with special types of layers that work well on images”

(math works on numbers)

• Pixel = 3 colour channels (R, G, B)

• Pixel intensity ∈[0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers

Page 31: Deep Learning for Computer Vision - ExecutiveML

31This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

Page 32: Deep Learning for Computer Vision - ExecutiveML

32This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

Page 33: Deep Learning for Computer Vision - ExecutiveML

Convolutions

33

http://setosa.io/ev/image-kernels/

Page 34: Deep Learning for Computer Vision - ExecutiveML

Convolutions

34http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

Page 35: Deep Learning for Computer Vision - ExecutiveML

New Layer Type: Convolutional Layer

35

• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates (learns)

the kernel values (using backpropagation) to try minimize loss• Kernels shared across the whole image (parameter sharing)

Page 36: Deep Learning for Computer Vision - ExecutiveML

Many Kernels = Many “Activation Maps” = Volume

36http://cs231n.github.io/convolutional-networks/

Page 37: Deep Learning for Computer Vision - ExecutiveML

New Layer Type: Convolutional Layer

37

Page 38: Deep Learning for Computer Vision - ExecutiveML

Convolutions

38

https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Page 39: Deep Learning for Computer Vision - ExecutiveML

Convolutions

39

Page 40: Deep Learning for Computer Vision - ExecutiveML

Convolutions

40

Page 41: Deep Learning for Computer Vision - ExecutiveML

Convolutions

41

Page 42: Deep Learning for Computer Vision - ExecutiveML

Convolution Learn Heirarchial Features

42

Page 43: Deep Learning for Computer Vision - ExecutiveML

Great vid

43

https://www.youtube.com/watch?v=AgkfIQ4IGaM

Page 44: Deep Learning for Computer Vision - ExecutiveML

New Layer Type: Max Pooling

44

Page 45: Deep Learning for Computer Vision - ExecutiveML

New Layer Type: Max Pooling

• Reduces dimensionality from one layer to next• …by replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time• e.g. Instead of identifying fur, identify cat• Reduces overfitting since losing information helps the network generalize

45

Page 46: Deep Learning for Computer Vision - ExecutiveML

New Layer Type: Max Pooling

46

Page 47: Deep Learning for Computer Vision - ExecutiveML

Softmax

• Convert scores∈ℝ to probabilities∈ [0,1]

• Then predict the class with highest probability

47

Page 48: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

48

Convolution + max pooling + fully connected + softmax

Page 49: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

49

Convolution + max pooling + fully connected + softmax

Page 50: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

50

Convolution + max pooling + fully connected + softmax

Page 51: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

51

Convolution + max pooling + fully connected + softmax

Page 52: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

52

Convolution + max pooling + fully connected + softmax

Page 53: Deep Learning for Computer Vision - ExecutiveML

Bringing it all together

53

Convolution + max pooling + fully connected + softmax

Page 54: Deep Learning for Computer Vision - ExecutiveML

We need labelled training data!

Page 55: Deep Learning for Computer Vision - ExecutiveML

ImageNet

55http://image-net.org/explore

1000 object categories

1.2 million training images

Page 56: Deep Learning for Computer Vision - ExecutiveML

ImageNet

56

Page 57: Deep Learning for Computer Vision - ExecutiveML

ImageNet

57

Page 58: Deep Learning for Computer Vision - ExecutiveML

ImageNet Top 5 Error Rate

58

TraditionalImage Processing Methods

AlexNet8 Layers

ZFNet8 Layers

GoogLeNet22 Layers ResNet

152 Layers SENetEnsamble

TSNetEnsamble

Page 59: Deep Learning for Computer Vision - ExecutiveML

3. How to use a convolutional neural

network

Page 60: Deep Learning for Computer Vision - ExecutiveML

Using a Pre-Trained ImageNet-Winning CNN

60

• We’ve been looking at “VGGNet”

• Oxford Visual Geometry Group (VGG)

• ImageNet 2014 Runner-up

• Network is 16 layers (deep!)

• Easy to fine-tune

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Page 61: Deep Learning for Computer Vision - ExecutiveML

Example: Classifying Product Images

61

https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Classifying

products into

9 categories

Page 62: Deep Learning for Computer Vision - ExecutiveML

62https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Start with Pre-Trained ImageNet Model

Page 63: Deep Learning for Computer Vision - ExecutiveML

“Transfer Learning”is a game changer

Page 64: Deep Learning for Computer Vision - ExecutiveML

Fine-tuning A CNN To Solve A New Problem• Cut off last layer of pre-trained Imagenet winning CNN• Keep learned network (convolutions) but replace final layer• Can learn to predict new (completely different) classes• Fine-tuning is re-training new final layer - learn for new task

64

Page 65: Deep Learning for Computer Vision - ExecutiveML

Fine-tuning A CNN To Solve A New Problem

65

Page 66: Deep Learning for Computer Vision - ExecutiveML

66

Before Fine-Tuning

Page 67: Deep Learning for Computer Vision - ExecutiveML

67

After Fine-Tuning

Page 68: Deep Learning for Computer Vision - ExecutiveML

Fine-tuning A CNN To Solve A New Problem

• Fix weights in convolutional layers (set trainable=False)

• Remove final dense layer that predicts 1000 ImageNet classes

• Replace with new dense layer to predict 9 categories

68

88% accuracy in under 2 minutes for classifying products into categories

Fine-tuning is awesome!Insert obligatory brain analogy

Page 69: Deep Learning for Computer Vision - ExecutiveML

Visual Similarity

69

• Chop off last 2 VGG layers

• Use dense layer with 4096 activations

• Compute nearest neighbours in the space of these activations

https://memeburn.com/2017/06/spree-image-search/

Page 70: Deep Learning for Computer Vision - ExecutiveML

70

https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Page 71: Deep Learning for Computer Vision - ExecutiveML

71

Input Imagenot seen by model

ResultsTop 10 most “visually similar”

Page 72: Deep Learning for Computer Vision - ExecutiveML

4. More Advanced Methods

Page 73: Deep Learning for Computer Vision - ExecutiveML

Use a Better Architecture (or all of them!)

73

“Ensambles win”

learn a weighted average of many models’ predictions

Page 74: Deep Learning for Computer Vision - ExecutiveML

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

There are MANY Computer Vision Tasks

Page 75: Deep Learning for Computer Vision - ExecutiveML

> Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015

> Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015

Semantic Segmentation

Page 76: Deep Learning for Computer Vision - ExecutiveML

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Object detection

Page 77: Deep Learning for Computer Vision - ExecutiveML

Object detection

Page 78: Deep Learning for Computer Vision - ExecutiveML

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Page 79: Deep Learning for Computer Vision - ExecutiveML

http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py

Style TransferLoss = content_loss + style_loss

Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions

Page 80: Deep Learning for Computer Vision - ExecutiveML

Video Q&A

XXX

80

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 81: Deep Learning for Computer Vision - ExecutiveML

Video Q&A

XXX

81

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 82: Deep Learning for Computer Vision - ExecutiveML

5. Case Studies

Page 83: Deep Learning for Computer Vision - ExecutiveML

Image & Video Moderation

TODO

83

Large international gay dating app with tens of millions of users uploading hundreds-of-thousands of photos per day

Page 84: Deep Learning for Computer Vision - ExecutiveML

Estimating Accident Repair Cost from Photos

TODO

84

Prototype for large SA insurer

Detect car make & model from registration disk

Predict repair cost using learned model

Page 85: Deep Learning for Computer Vision - ExecutiveML

Optical Sorting

TODO

85

https://www.youtube.com/watch?v=Xf7jaxwnyso

Page 86: Deep Learning for Computer Vision - ExecutiveML

Segmenting Medical Images

TODO

86

Page 87: Deep Learning for Computer Vision - ExecutiveML

m

Counting People

TODO

Count shoppers, segment on age & genderfacial recognition loyalty is next

Page 88: Deep Learning for Computer Vision - ExecutiveML

Counting Cars

TODO

Page 89: Deep Learning for Computer Vision - ExecutiveML

Wine A.I.

Page 90: Deep Learning for Computer Vision - ExecutiveML

Detecting Potholes

Page 91: Deep Learning for Computer Vision - ExecutiveML

GET IN TOUCH!

Alex Conway

alex @ numberboost.com@alxcnwy

Page 92: Deep Learning for Computer Vision - ExecutiveML

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/

Computer Vision Pipelines

https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html

Page 93: Deep Learning for Computer Vision - ExecutiveML

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

Page 94: Deep Learning for Computer Vision - ExecutiveML

Practical Tips• use a GPU – AWS p2 instances (use spot!)

• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping

• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation

– Flipping / stretching / rotating– Slightly change hues / brightness

3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)

94