Alex Tellez, Deep Learning Applications

37
DEEP LEARNING APPLICATIONS Mills College, 3/12/2015 Alex Tellez - [email protected]

Transcript of Alex Tellez, Deep Learning Applications

Page 1: Alex Tellez, Deep Learning Applications

DEEP LEARNING APPLICATIONSMills College, 3/12/2015Alex Tellez - [email protected]

Page 2: Alex Tellez, Deep Learning Applications

H2O - MORE THAN WATERWhat is H2O? (water, duh!)

It is ALSO an open-source, parallel processing engine for machine learning.

What makes H2O different?

Cutting-edge algorithms + parallel architecture + ease-of-use

=Happy Data Scientists / Analysts

Page 3: Alex Tellez, Deep Learning Applications

TEAM @ H2O.AI16,000 commits

H2O World Conference 2014

Page 4: Alex Tellez, Deep Learning Applications

COMMUNITY REACH

120 meetups in 201411,000 installations2,000 corporationsFirst Friday Hack-A-Thons

Page 5: Alex Tellez, Deep Learning Applications

TRY IT!Don’t take my word for it…www.h2o.ai

Simple Instructions

1. CD to Download Location2. unzip h2o file3. java -jar h2o.jar4. Point browser to: localhost:54321

GUI

R

Page 6: Alex Tellez, Deep Learning Applications

SUPERVISED LEARNINGDeep Learning Applications on Labeled Data

Page 7: Alex Tellez, Deep Learning Applications

SUPERVISED LEARNINGWhat is it?

Examples of supervised learning tasks:

1. Classification Tasks - Benign / Malignant tumor 2. Regression Tasks - Predicting future stock market prices

3. Image Recognition - Highlighting faces in pictures

Methods that infer a function from labeled training data. Key task: Predicting ________ . (Insert your task here)

Page 8: Alex Tellez, Deep Learning Applications

SUPERVISED ALGORITHMS

Ensembles

Deep Neural Networks

•  Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

•  Cox Proportional Hazards Models •  Naïve Bayes

•  Distributed Random Forest: Classification or regression models

•  Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

•  Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Statistical Analysis

VERY HOT subject area & our topic today!

Page 9: Alex Tellez, Deep Learning Applications

WHY NEURAL NETS?Linear Classification Non-Linear Classification

Error

Page 10: Alex Tellez, Deep Learning Applications

NEURAL NETS + H20

Inputs

Outputs

Hidden Features

Neurons activate each other via weighted sums

y1

y2

x1

x2

x3

x3

h1

h2

h3

Activation Functions H2O Supports:

Tanh

Rectifier

Maxout

Page 11: Alex Tellez, Deep Learning Applications

FINDING THE HIGGS-BOSONTask:

Can we identify the Higgs-Boson particle vs. background noise using ‘low-level’ machine generated data?

Live Demo!

CERN Lab

Page 12: Alex Tellez, Deep Learning Applications

FIGHTING CRIME IN CHICAGO

Spark + H2O

Page 13: Alex Tellez, Deep Learning Applications

OPEN CITY, OPEN DATA“…my kind of town” - F. Sinatra

~4.6 Million rows of crimes from 2001, updated weekly*External data source considerations???

Weather Data ?U.S. CensusData ?

Crime Data

Page 14: Alex Tellez, Deep Learning Applications

ML WORKFLOW

1. Collect datasets (Crime + Weather + Census)2. Do some feature extraction (e.g. dates, times)3. Join Crime data Weather Data Census Data4. Build deep learning model to predict

arrest / no arrest made

GOAL:For a given crime,

predict if an arrest is more / less likely to be made!

Page 15: Alex Tellez, Deep Learning Applications

SPARK SQL + H2O RDD3 table join using Spark SQL

Convert joined table to H2O RDD

Page 16: Alex Tellez, Deep Learning Applications

H2O DEEP LEARNINGCan do grid search over many parameters!

Page 17: Alex Tellez, Deep Learning Applications

HOW’D WE DO?

nice!

~ 10 mins

Page 18: Alex Tellez, Deep Learning Applications

MODEL BUILDING + TUNINGDReD Net = Deep Rectifier w/ Dropout Neural Net

Arrest

Inputs

X

X

X

X

Epochs, hidden layers, regularization

Page 19: Alex Tellez, Deep Learning Applications

UNSUPERVISED LEARNINGDeep Learning Applications on Non-Labeled Data

Page 20: Alex Tellez, Deep Learning Applications

UNSUPERVISED LEARNINGWhat is it?

Examples of unsupervised learning tasks:

1. Clustering - Discovering customer segments2. Topic Extraction - What topics are people tweeting about?

3. Information Retrieval - IBM Watson: Question + Answer

Methods to understand the general structure of input data whereno predictions is needed.

4. Anomaly Detection - Detecting irregular heart-beats

NO CURATION NEEDED!

Page 21: Alex Tellez, Deep Learning Applications

UNSUPERVISED ALGORITHMS

Dimensionality Reduction

Anomaly Detection

•  K-means: Partitions observations into k clusters/groups of the same spatial size

•  Principal Component Analysis: Linearly transforms correlated variables to independent components

•  Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

Clustering

Page 22: Alex Tellez, Deep Learning Applications

AUTOENCODER + H2OInput Output

HiddenFeatures

Information Flow

x1

x2

x3

x4

x1

x2

x3

x4

Dogs, Dogs and Dogs

Page 23: Alex Tellez, Deep Learning Applications

ANOMALY DETECTION OF VINTAGE YEAR BORDEAUX WINE

Page 24: Alex Tellez, Deep Learning Applications

BORDEAUX WINELargest wine-growing region in France

+ 700 Million bottles of wine produced / year !

Some years better than others: Great ($$$) vs. Typical ($)Last Great years: 2010, 2009, 2005, 2000

Page 25: Alex Tellez, Deep Learning Applications

GREAT VS. TYPICAL VINTAGE?Question:

Can we study weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years >>

correlates to Great ($$$) vs. Typical ($) Vintage?

The Bordeaux Dataset (1952 - 2014 Yearly)

Amount of Winter Rain (Oct > Apr of harvest year)Average Summer Temp (Apr > Sept of harvest year)Rain during Harvest (Aug > Sept)Years since last Great Vintage

Page 26: Alex Tellez, Deep Learning Applications

AUTOENCODER + ANOMALY DETECTION

ML Workflow:

1) Train autoencoder to learn ‘typical’ vintage weather pattern2) Append ‘great’ vintage year weather data to original dataset3) IF great vintage year weather data does NOT match learnedweather pattern, autoencoder will produce high reconstruction

error (MSE)

‘en primeur of en primeur’ - Can we use weather patterns to identify anomalous years >> indicates great vintage quality?

Goal:

Page 27: Alex Tellez, Deep Learning Applications

RESULTS (MSE > 0.10)

Mean  Square  Error

1961 V 2009 V

2005 V2000 V

1990 V

1989 V

1982 V2010 V

Page 28: Alex Tellez, Deep Learning Applications

2014 BORDEAUX??

Mean  Square  Error

2014  ?2013

Page 29: Alex Tellez, Deep Learning Applications

DEEP AUTOENCODERS + K-MEANS EXAMPLE

Help cyclists with their health related questions!

Page 30: Alex Tellez, Deep Learning Applications

CYCLING + __________Problem:

New and Experienced Cyclists have questions about cycling + ______ (given topic). Let’s build a question + answer system to help!

ML Workflow:1) Scrape thousands of article titles from internet about cycling /

cycling tips / cycling health, etc from various sources.

2) Build Bag-of-Words Dataset on article titles corpus

3) Reduce # of dimensions via deep autoencoder

4) Extract ‘last layer’ of deep features and cluster using k-means

5) Inspect Results!

Page 31: Alex Tellez, Deep Learning Applications

BAG-OF-WORDSBuild dataset of cycling-related articles from various sources:

The Basics of Exercise Nutrition

0 , 0 , 0 , 0 , 1, 1, 0 , 0 , 1, 0 , 0 …, 0

basics exercise nutrition

lower caseremove ‘stopwords’remove punctuation

Article Title

[ ]

Page 32: Alex Tellez, Deep Learning Applications

DIMENSIONALITY REDUCTION

Use deep autoencoder to reduce # features (~2,700 words!)

2,700 Words

500 hidden features

250 H.F.

125 H.F.

50

125 H.F.

250 H.F.

500 hidden features

2,700 Words

Decoder

Encoder

The Basics of Exercise Nutrition

Page 33: Alex Tellez, Deep Learning Applications

K-MEANS CLUSTERINGFor each article: Extract ‘last’ layer of autoencoder (50 deep features)

The Basics of Exercise Nutrition 50 ‘deep features’

The Basics of Exercise Nutrition -­‐0.09330833 0.167881429 -­‐0.234307408 0.247723639 -­‐0.067700267 -­‐0.094107866

DF1 DF2 DF3 DF4 DF5 DF6

K-Means ClusteringInputs: Extracted 50 deep features for each cycling-related articleK = 50 clusters after grid-search of values

Page 34: Alex Tellez, Deep Learning Applications

RESULT: CYCLING + A.I.Now we inspect the clusters!

Test Article Title:Fluid & Carbohydrate Ingestion Improve Performance During 1Hour of

Intense Exercise

Result:Clustered w/ 17 other titles (out of ~5,700)

Top 5 similar titles within cluster :

Caffeine ingestion does not alter performance during a 100-km cycling time-trial performance

Immuno-endocrine response to cycling following ingestion of caffeine and carbohydrate

Metabolism and performance following carbohydrate ingestion late in exercise

Increases in cycling performance in response to caffeine ingestion are repeatable

Fluid ingestion does not influence intense 1-h exercise performance in a mild environment

Page 35: Alex Tellez, Deep Learning Applications

HOW TO GET FASTER?Test Article Title:

Muscle Coordination is Key to Power Output & Mechanical Efficiency of Limb Movements

Result:Clustered w/ 29 other titles (out of ~5,700)

Top 5 similar titles within cluster :Muscle fibre type efficiency and mechanical optima affect freely chosen pedal rate during cycling.

Standard mechanical energy analyses do not correlate with muscle work in cycling.

The influence of body position on leg kinematics and muscle recruitment during cycling.

Influence of repeated sprint training on pulmonary O2 uptake and muscle deoxygenation kinetics in humans

Influence of pedaling rate on muscle mechanical energy in low power recumbent pedaling using forward dynamic simulations

Page 36: Alex Tellez, Deep Learning Applications

WHAT’S NEXT??Build smarter apps!!

[email protected]/h2oai

Hack with us!!

Page 37: Alex Tellez, Deep Learning Applications

HIGGS-BOSON PARTICLE

How did our Deep Neural Net do??

BEST Low-Level AUC: 0.73