Deep Learning Made Easy with Deep Features
-
Upload
turi-inc -
Category
Data & Analytics
-
view
1.170 -
download
0
Transcript of Deep Learning Made Easy with Deep Features
PowerPoint Presentation
Deep learning Made Easy with Deep FeaturesPiotr TeterwakDato, Machine Learning Engineer
Hello, my name is
Piotr TeterwakMachine Learning Engineer,Dato
#
2
Who is Dato?
#
Graphlab Create: Production ML PipelineDATA
Your Web Service or Intelligent App
MLAlgorithm
Data cleaning & feature eng
Offline eval &Parameter search
Deploy model
Data engineering
Data intelligence
DeploymentGoal: Platform to help implement, manage, optimize entire pipeline
Deep Learning
Todays talk
Todays talk
Features are key to machine learning
Simple example: Spam filteringA user opens an emailWill she thinks its spam?
Whats the probability email is spam?Text of email
User info
Source infoInput: xMODELYes!NoOutput: Probability of y
#
Feature engineering: the painful black art of transforming raw inputs into useful inputs for ML algorithmE.g., important words, complex transformation of input, MODELYes!NoOutput: Probability of yFeature extractionFeatures: (x)Text of email
User info
Source infoInput: x
#
Deep Learning for Learning Features
Linear classifiersMost common classifierLogistic regressionSVMs
Decision correspond to hyperplane:Line in high dimensional space
w0 + w1 x1 + w2 x2 = 0w0 + w1 x1 + w2 x2 > 0w0 + w1 x1 + w2 x2 < 0
#
What can a simple linear classifier represent?
AND
0011
#
What can a simple linear classifier represent?
OR0011
#
What cant a simple linear classifier represent?
XOR0011
Need non-linear features
#
Non-linear feature embedding
0011
#
Graph representation of classifier:Useful for defining neural networksx1x2xdy1w0w1w2wdw0 + w1 x1 + w2 x2 + + wd xd
> 0, output 1< 0, output 0InputOutput
#
What can a linear classifier represent?x1 OR x2x1 AND x2
x1x21y-0.511x1x21y-1.511
#
Solving the XOR problem: Adding a layerXOR = x1 AND NOT x2 OR NOT x1 AND x2z1-0.51-1
z1
z2z2-0.5-11x1x21y1-0.511Thresholded to 0 or 1
http://deeplearning.stanford.edu/wiki/images/4/40/Network3322.pngDeep Neural Networks
P(cat|x)P(dog|x)
#
Deep Neural NetworksCan model any function with enough hidden units. This is tremendously powerful: given enough units, it is possible to train a neural network to solve arbitrarily difficult problems. But also very difficult to train, too many parameters means too much memory+computation time.
#
Neural Nets and GPUsMany operations in Neural Net training can happen in parallelReduces to matrix operations, many of which can be easily parallelized on a GPU.
#
A neural networkLayers and layers and layers of linear models and non-linear transformation
Around for about 50 yearsFell in disfavor in 90sIn last few years, big resurgenceImpressive accuracy on a several benchmark problemsPowered by huge datasets, GPUs, & modeling/learning algo. improvements x1x21z1z21y
#
Convolutional Neural NetsStrategic removal of edgesInput Layer
Hidden Layer
#
Convolutional Neural NetsStrategic removal of edges
Input LayerHidden Layer
#
Convolutional Neural NetsStrategic removal of edges
Input LayerHidden Layer
#
Convolutional Neural NetsStrategic removal of edges
Input LayerHidden Layer
#
Convolutional Neural NetsStrategic removal of edges
Input LayerHidden Layer
#
Convolutional Neural NetsStrategic removal of edges
Input LayerHidden Layer
#
Convolutional Neural Netshttp://ufldl.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif
#
30
Pooling layerRanzato, LSVR tutorial @ CVPR, 2014. www.cs.toronto.edu/~ranzato
#
Pooling layerhttp://ufldl.stanford.edu/wiki/images/6/6c/Pooling_schematic.gif
#
Final Network
Krizhevsky et al. 12
#
Applications to computer vision
Image featuresFeatures = local detectorsCombined to make prediction(in reality, features are more low-level)
Face!Eye
Eye
Nose
Mouth
#
Standard image classification approachInput
Extract features
Use simple classifiere.g., logistic regression, SVMsFace
#
Many hand crafted features exist
but very painful to design
#
Change image classification approach?Input
Extract features
Use simple classifiere.g., logistic regression, SVMsFace
Can we learn features from data?
#
Use neural network to learn features
InputLearned hierarchy
OutputLee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009
#
Sample resultsTraffic sign recognition (GTSRB)99.2% accuracyHouse number recognition (Google)94.3% accuracy40
Krizhevsky et al. 12: 60M parameters, won 2012 ImageNet competition41
ImageNet 2012 competition: 1.2M images, 1000 categories42
#
Application to scene parsingCarlos Guestrin 2005-2014
#
A quick demo!44
Challenges of deep learning
Deep learning score cardProsEnables learning of features rather than hand tuning
Impressive performance gains onComputer visionSpeech recognitionSome text analysis
Potential for much more impactCons
Deep learning workflowLots of labeled data
Training setValidation set80%20%Learn deep neural net modelValidate Adjust hyper-parameters, model architecture,
Deep learning score cardProsEnables learning of features rather than hand tuning
Impressive performance gains onComputer visionSpeech recognitionSome text analysis
Potential for much more impactConsComputationally really expensiveRequires a lot of data for high accuracyExtremely hard to tuneChoice of architectureParameter typesHyperparametersLearning algorithmComputational + so many choices = incredibly hard to tune
Can we do better?
InputLearned hierarchy
OutputLee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009
#
Deep features: Deep learning+ Transfer learning
Transfer learning:Use data from one domain to help learn on another
Old idea, explored for deep learning by Donahue et al. 14
#
Whats learned in a neural net
Neural net trained for Task 1
Very specific to Task 1
More genericCan be used as feature extractor
vs.
#
Transfer learning in more detail
Neural net trained for Task 1
Very specific to Task 1
More genericCan be used as feature extractorKeep weights fixed!For Task 2, learn only end part
Use simple classifiere.g., logistic regression, SVMsClass?
#
Using ImageNet-trained network as extractor for general featuresUsing classic AlexNet architechture pioneered by Alex Krizhevsky et. al in ImageNet Classification with Deep Convolutional Neural Networks It turns out that a neural network trained on ~1 million images of about 1000 classes makes a surprisingly general feature extractorFirst illustrated by Donahue et al in DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition54
#
Caltech-10155
#
Deep Features and Logistic Regression56
#
Transfer learning with deep featuresTraining setValidation set80%20%Learn simple modelSome labeled data
Extract features with neural net trained on different task
Validate Deploy in production
Demo58
What else can we do with Deep Features?59
Finding similar images
60
Applications to text data
Simple text classification with bag of words
aardvark0about2all2Africa1apple0anxious0...gas1...oil1Zaire0
Use simple classifiere.g., logistic regression, SVMsClass?One feature per word
Word2Vec: Neural network for finding word representationMikolov et al. 13Skip-gram Model: From a word, predict nearby words in sentencedogAwentforawalkNeural netViewed as deep features
Word2Vec: Neural network for finding high dimensional representation per wordMikolov et al. 13
http://www.folgertkarsdorp.nl/word2vec-an-introduction/
Related words placed nearby high dim space
Projecting 300 dim space into 2 dim with PCA (Mikolov et al. 13)
#
Blog corpusHahaYeaHahahaHahahLisxcUmmHehelaughingoutloudLOLClosest wordsin 300 dimPredicts gender of author with 79% accuracy
ML in production(Or how this is relevant to data scientists)
2015: Production ML pipelineDATA
Your Web Service or Intelligent App
MLAlgorithm
Data cleaning & feature eng
Offline eval &Parameter search
Deploy model
Data engineering
Data intelligence
DeploymentUsing deep learningGoal: Platform to help implement, manage, optimize entire pipeline
In real life
Take Home Message
Take Home Message
Use simple classifiere.g., logistic regression, SVMsClass?Deep Features are remarkable!
#
71
CONF.DATO.COM
#
Dato Office Hours @ Galvanize SFBring your laptop & some data & well help you get startedWhen: Thurs (tomorrow) 2:30p-5p followed by beersWhere: Galvanize 44 Tehama St. (SOMA) in SF
Talk to me/email me: [email protected]
+
#
73
Get the software: dato.com/download
Learn: dato.com/learn
Learn more: blog.dato.com
Join us: were hiring lots!
Contact me: [email protected]
Go create something! [with Dato]
Fast & scalableRich data type supportVisualization
App-oriented MLSupporting utilsExtensibility
Batch & always-onRESTful interfaceElastic & robust
#