Deep Learning for Computer Vision - ExecutiveML
-
Upload
alex-conway -
Category
Data & Analytics
-
view
380 -
download
0
Transcript of Deep Learning for Computer Vision - ExecutiveML
Deep Learning for Computer VisionExecutive-ML 2017/09/21
Neither Proprietary nor Confidential – Please Distribute ;)
Alex Conway
alex @ numberboost.com@alxcnwy
Hands up!
Check out theDeep Learning Indaba
videos & practicals!
http://www.deeplearningindaba.com/videos.htmlhttp://www.deeplearningindaba.com/practicals.html
Deep Learning is Sexy (for a reason!)
4
Image Classification
5
http://yann.lecun.com/exdb/mnist/
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py(99.25% test accuracy in 192 seconds and 70 lines of code)
Image Classification
6
Image Classification
7
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
Image Captioning & Visual Attention
XXX
10
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning
Image Q&A
11https://arxiv.org/pdf/1612.00837.pdf
Video Q&A
XXX
12
https://www.youtube.com/watch?v=UeheTiBJ0Io
Pix2Pix
https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow
13
Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66
14
15
Original inputRear Window (1954)
Pix2pix outputFully Automated
RemasteredPainstakingly by Hand
https://hackernoon.com/remastering-classic-films-in-tensorflow-with-pix2pix-f4d551fa0503
Style Transfer
https://github.com/junyanz/CycleGAN16
Style Transfer MAGIC
https://github.com/junyanz/CycleGAN17
1. What is a neural network?
2. What is a convolutional neural network?
3. How to use a convolutional neural network
4. More advanced Methods
5. Case studies & applications
18
Big Shout Outs
Jeremy Howard & Rachel Thomashttp://course.fast.ai
Andrej Karpathyhttp://cs231n.github.io
François Chollet (Keras lead dev)https://keras.io/
19
1.What is a neural network?
What is a neuron?
21
• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f• Often add a bias too (weight of 1) – not shown
What is an Activation Function?
22
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]
What is a (Deep) Neural Network?
23
Inputs outputs
hiddenlayer 1
hiddenlayer 2
hiddenlayer 3
Outputs of one layer are inputs into the next layer
How does a neural network learn?
24
• You need labelled examples “training data”
• Initially, the network makes random predictions (weights initialized randomly)
• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (aka “loss function”)
• Use ‘backpropagation’ (really just the chain rule), to update the network parameters (weights) in the opposite direction to the error
How does a neural network learn?
25
New weight = Old
weightLearning
rate-Gradient of weight with
respect to Error( )x“How much
error increaseswhen we increase
this weight”
Gradient Descent Interpretation
26
http://scs.ryerson.ca/~aharley/neural-networks/
http://playground.tensorflow.org
What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html
28
2. What is a convolutional
neural network?
What is a Convolutional Neural Network?
30
“like a ordinary neural network but with special types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers
31This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
32This is VGGNet – don’t panic, we’ll break it down piece by piece
Example Architecture
Convolutions
33
http://setosa.io/ev/image-kernels/
Convolutions
34http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
New Layer Type: Convolutional Layer
35
• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates (learns)
the kernel values (using backpropagation) to try minimize loss• Kernels shared across the whole image (parameter sharing)
Many Kernels = Many “Activation Maps” = Volume
36http://cs231n.github.io/convolutional-networks/
New Layer Type: Convolutional Layer
37
Convolutions
38
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
Convolutions
39
Convolutions
40
Convolutions
41
Convolution Learn Heirarchial Features
42
Great vid
43
https://www.youtube.com/watch?v=AgkfIQ4IGaM
New Layer Type: Max Pooling
44
New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next• …by replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time• e.g. Instead of identifying fur, identify cat• Reduces overfitting since losing information helps the network generalize
45
New Layer Type: Max Pooling
46
Softmax
• Convert scores∈ℝ to probabilities∈ [0,1]
• Then predict the class with highest probability
47
Bringing it all together
48
Convolution + max pooling + fully connected + softmax
Bringing it all together
49
Convolution + max pooling + fully connected + softmax
Bringing it all together
50
Convolution + max pooling + fully connected + softmax
Bringing it all together
51
Convolution + max pooling + fully connected + softmax
Bringing it all together
52
Convolution + max pooling + fully connected + softmax
Bringing it all together
53
Convolution + max pooling + fully connected + softmax
We need labelled training data!
ImageNet
55http://image-net.org/explore
1000 object categories
1.2 million training images
ImageNet
56
ImageNet
57
ImageNet Top 5 Error Rate
58
TraditionalImage Processing Methods
AlexNet8 Layers
ZFNet8 Layers
GoogLeNet22 Layers ResNet
152 Layers SENetEnsamble
TSNetEnsamble
3. How to use a convolutional neural
network
Using a Pre-Trained ImageNet-Winning CNN
60
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Easy to fine-tune
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Example: Classifying Product Images
61
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
Classifying
products into
9 categories
62https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Start with Pre-Trained ImageNet Model
“Transfer Learning”is a game changer
Fine-tuning A CNN To Solve A New Problem• Cut off last layer of pre-trained Imagenet winning CNN• Keep learned network (convolutions) but replace final layer• Can learn to predict new (completely different) classes• Fine-tuning is re-training new final layer - learn for new task
64
Fine-tuning A CNN To Solve A New Problem
65
66
Before Fine-Tuning
67
After Fine-Tuning
Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set trainable=False)
• Remove final dense layer that predicts 1000 ImageNet classes
• Replace with new dense layer to predict 9 categories
68
88% accuracy in under 2 minutes for classifying products into categories
Fine-tuning is awesome!Insert obligatory brain analogy
Visual Similarity
69
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
https://memeburn.com/2017/06/spree-image-search/
70
https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
71
Input Imagenot seen by model
ResultsTop 10 most “visually similar”
4. More Advanced Methods
Use a Better Architecture (or all of them!)
73
“Ensambles win”
learn a weighted average of many models’ predictions
cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
There are MANY Computer Vision Tasks
> Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015
> Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015
Semantic Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style TransferLoss = content_loss + style_loss
Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions
Video Q&A
XXX
80
https://www.youtube.com/watch?v=UeheTiBJ0Io
Video Q&A
XXX
81
https://www.youtube.com/watch?v=UeheTiBJ0Io
5. Case Studies
Image & Video Moderation
TODO
83
Large international gay dating app with tens of millions of users uploading hundreds-of-thousands of photos per day
Estimating Accident Repair Cost from Photos
TODO
84
Prototype for large SA insurer
Detect car make & model from registration disk
Predict repair cost using learned model
Optical Sorting
TODO
85
https://www.youtube.com/watch?v=Xf7jaxwnyso
Segmenting Medical Images
TODO
86
m
Counting People
TODO
Count shoppers, segment on age & genderfacial recognition loyalty is next
Counting Cars
TODO
Wine A.I.
Detecting Potholes
GET IN TOUCH!
Alex Conway
alex @ numberboost.com@alxcnwy
http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html
https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/
Practical Tips• use a GPU – AWS p2 instances (use spot!)
• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping
• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation
– Flipping / stretching / rotating– Slightly change hues / brightness
3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)
94