Deep Learning Made Easy with Deep Features

73
Deep learning Made Easy with Deep Features Piotr Teterwak Dato, Machine Learning Engineer

Transcript of Deep Learning Made Easy with Deep Features

PowerPoint Presentation

Deep learning Made Easy with Deep FeaturesPiotr TeterwakDato, Machine Learning Engineer

Hello, my name is

Piotr TeterwakMachine Learning Engineer,Dato

#

2

Who is Dato?

#

Graphlab Create: Production ML PipelineDATA

Your Web Service or Intelligent App

MLAlgorithm

Data cleaning & feature eng

Offline eval &Parameter search

Deploy model

Data engineering

Data intelligence

DeploymentGoal: Platform to help implement, manage, optimize entire pipeline

Deep Learning

Todays talk

Todays talk

Features are key to machine learning

Simple example: Spam filteringA user opens an emailWill she thinks its spam?

Whats the probability email is spam?Text of email

User info

Source infoInput: xMODELYes!NoOutput: Probability of y

#

Feature engineering: the painful black art of transforming raw inputs into useful inputs for ML algorithmE.g., important words, complex transformation of input, MODELYes!NoOutput: Probability of yFeature extractionFeatures: (x)Text of email

User info

Source infoInput: x

#

Deep Learning for Learning Features

Linear classifiersMost common classifierLogistic regressionSVMs

Decision correspond to hyperplane:Line in high dimensional space

w0 + w1 x1 + w2 x2 = 0w0 + w1 x1 + w2 x2 > 0w0 + w1 x1 + w2 x2 < 0

#

What can a simple linear classifier represent?

AND

0011

#

What can a simple linear classifier represent?

OR0011

#

What cant a simple linear classifier represent?

XOR0011

Need non-linear features

#

Non-linear feature embedding

0011

#

Graph representation of classifier:Useful for defining neural networksx1x2xdy1w0w1w2wdw0 + w1 x1 + w2 x2 + + wd xd

> 0, output 1< 0, output 0InputOutput

#

What can a linear classifier represent?x1 OR x2x1 AND x2

x1x21y-0.511x1x21y-1.511

#

Solving the XOR problem: Adding a layerXOR = x1 AND NOT x2 OR NOT x1 AND x2z1-0.51-1

z1

z2z2-0.5-11x1x21y1-0.511Thresholded to 0 or 1

http://deeplearning.stanford.edu/wiki/images/4/40/Network3322.pngDeep Neural Networks

P(cat|x)P(dog|x)

#

Deep Neural NetworksCan model any function with enough hidden units. This is tremendously powerful: given enough units, it is possible to train a neural network to solve arbitrarily difficult problems. But also very difficult to train, too many parameters means too much memory+computation time.

#

Neural Nets and GPUsMany operations in Neural Net training can happen in parallelReduces to matrix operations, many of which can be easily parallelized on a GPU.

#

A neural networkLayers and layers and layers of linear models and non-linear transformation

Around for about 50 yearsFell in disfavor in 90sIn last few years, big resurgenceImpressive accuracy on a several benchmark problemsPowered by huge datasets, GPUs, & modeling/learning algo. improvements x1x21z1z21y

#

Convolutional Neural NetsStrategic removal of edgesInput Layer

Hidden Layer

#

Convolutional Neural NetsStrategic removal of edges

Input LayerHidden Layer

#

Convolutional Neural NetsStrategic removal of edges

Input LayerHidden Layer

#

Convolutional Neural NetsStrategic removal of edges

Input LayerHidden Layer

#

Convolutional Neural NetsStrategic removal of edges

Input LayerHidden Layer

#

Convolutional Neural NetsStrategic removal of edges

Input LayerHidden Layer

#

Convolutional Neural Netshttp://ufldl.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif

#

30

Pooling layerRanzato, LSVR tutorial @ CVPR, 2014. www.cs.toronto.edu/~ranzato

#

Pooling layerhttp://ufldl.stanford.edu/wiki/images/6/6c/Pooling_schematic.gif

#

Final Network

Krizhevsky et al. 12

#

Applications to computer vision

Image featuresFeatures = local detectorsCombined to make prediction(in reality, features are more low-level)

Face!Eye

Eye

Nose

Mouth

#

Standard image classification approachInput

Extract features

Use simple classifiere.g., logistic regression, SVMsFace

#

Many hand crafted features exist

but very painful to design

#

Change image classification approach?Input

Extract features

Use simple classifiere.g., logistic regression, SVMsFace

Can we learn features from data?

#

Use neural network to learn features

InputLearned hierarchy

OutputLee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009

#

Sample resultsTraffic sign recognition (GTSRB)99.2% accuracyHouse number recognition (Google)94.3% accuracy40

Krizhevsky et al. 12: 60M parameters, won 2012 ImageNet competition41

ImageNet 2012 competition: 1.2M images, 1000 categories42

#

Application to scene parsingCarlos Guestrin 2005-2014

#

A quick demo!44

Challenges of deep learning

Deep learning score cardProsEnables learning of features rather than hand tuning

Impressive performance gains onComputer visionSpeech recognitionSome text analysis

Potential for much more impactCons

Deep learning workflowLots of labeled data

Training setValidation set80%20%Learn deep neural net modelValidate Adjust hyper-parameters, model architecture,

Deep learning score cardProsEnables learning of features rather than hand tuning

Impressive performance gains onComputer visionSpeech recognitionSome text analysis

Potential for much more impactConsComputationally really expensiveRequires a lot of data for high accuracyExtremely hard to tuneChoice of architectureParameter typesHyperparametersLearning algorithmComputational + so many choices = incredibly hard to tune

Can we do better?

InputLearned hierarchy

OutputLee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations ICML 2009

#

Deep features: Deep learning+ Transfer learning

Transfer learning:Use data from one domain to help learn on another

Old idea, explored for deep learning by Donahue et al. 14

#

Whats learned in a neural net

Neural net trained for Task 1

Very specific to Task 1

More genericCan be used as feature extractor

vs.

#

Transfer learning in more detail

Neural net trained for Task 1

Very specific to Task 1

More genericCan be used as feature extractorKeep weights fixed!For Task 2, learn only end part

Use simple classifiere.g., logistic regression, SVMsClass?

#

Using ImageNet-trained network as extractor for general featuresUsing classic AlexNet architechture pioneered by Alex Krizhevsky et. al in ImageNet Classification with Deep Convolutional Neural Networks It turns out that a neural network trained on ~1 million images of about 1000 classes makes a surprisingly general feature extractorFirst illustrated by Donahue et al in DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition54

#

Caltech-10155

#

Deep Features and Logistic Regression56

#

Transfer learning with deep featuresTraining setValidation set80%20%Learn simple modelSome labeled data

Extract features with neural net trained on different task

Validate Deploy in production

Demo58

What else can we do with Deep Features?59

Finding similar images

60

Applications to text data

Simple text classification with bag of words

aardvark0about2all2Africa1apple0anxious0...gas1...oil1Zaire0

Use simple classifiere.g., logistic regression, SVMsClass?One feature per word

Word2Vec: Neural network for finding word representationMikolov et al. 13Skip-gram Model: From a word, predict nearby words in sentencedogAwentforawalkNeural netViewed as deep features

Word2Vec: Neural network for finding high dimensional representation per wordMikolov et al. 13

http://www.folgertkarsdorp.nl/word2vec-an-introduction/

Related words placed nearby high dim space

Projecting 300 dim space into 2 dim with PCA (Mikolov et al. 13)

#

Blog corpusHahaYeaHahahaHahahLisxcUmmHehelaughingoutloudLOLClosest wordsin 300 dimPredicts gender of author with 79% accuracy

ML in production(Or how this is relevant to data scientists)

2015: Production ML pipelineDATA

Your Web Service or Intelligent App

MLAlgorithm

Data cleaning & feature eng

Offline eval &Parameter search

Deploy model

Data engineering

Data intelligence

DeploymentUsing deep learningGoal: Platform to help implement, manage, optimize entire pipeline

In real life

Take Home Message

Take Home Message

Use simple classifiere.g., logistic regression, SVMsClass?Deep Features are remarkable!

#

71

CONF.DATO.COM

#

Dato Office Hours @ Galvanize SFBring your laptop & some data & well help you get startedWhen: Thurs (tomorrow) 2:30p-5p followed by beersWhere: Galvanize 44 Tehama St. (SOMA) in SF

Talk to me/email me: [email protected]

+

#

73

Get the software: dato.com/download

Learn: dato.com/learn

Learn more: blog.dato.com

Join us: were hiring lots!

Contact me: [email protected]

Go create something! [with Dato]

Fast & scalableRich data type supportVisualization

App-oriented MLSupporting utilsExtensibility

Batch & always-onRESTful interfaceElastic & robust

#