Alex Tellez, Deep Learning Applications

DEEP LEARNING APPLICATIONSMills College, 3/12/2015Alex Tellez - [email protected]

mailto:[email protected]

H2O - MORE THAN WATERWhat is H2O? (water, duh!)

It is ALSO an open-source, parallel processing engine for machine learning.

What makes H2O different?

Cutting-edge algorithms + parallel architecture + ease-of-use

=Happy Data Scientists / Analysts

TEAM @ H2O.AI16,000 commits

H2O World Conference 2014

COMMUNITY REACH

120 meetups in 201411,000 installations2,000 corporationsFirst Friday Hack-A-Thons

TRY IT!Don’t take my word for it…www.h2o.ai

Simple Instructions

1. CD to Download Location2. unzip h2o file3. java -jar h2o.jar4. Point browser to: localhost:54321

GUI

R

http://www.h2o.ai

SUPERVISED LEARNINGDeep Learning Applications on Labeled Data

SUPERVISED LEARNINGWhat is it?

Examples of supervised learning tasks:

1. Classification Tasks - Benign / Malignant tumor 2. Regression Tasks - Predicting future stock market prices

3. Image Recognition - Highlighting faces in pictures

Methods that infer a function from labeled training data. Key task: Predicting ________ . (Insert your task here)

SUPERVISED ALGORITHMS

Ensembles

Deep Neural Networks

•  Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

•  Cox Proportional Hazards Models •  Naïve Bayes

•  Distributed Random Forest: Classification or regression models

•  Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

•  Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Statistical Analysis

VERY HOT subject area & our topic today!

WHY NEURAL NETS?Linear Classification Non-Linear Classification

Error

NEURAL NETS + H20

Inputs

Outputs

Hidden Features

Neurons activate each other via weighted sums

y1

y2

x1

x2

x3

x3

h1

h2

h3

Activation Functions H2O Supports:

Tanh

Rectifier

Maxout

FINDING THE HIGGS-BOSONTask:

Can we identify the Higgs-Boson particle vs. background noise using ‘low-level’ machine generated data?

Live Demo!

CERN Lab

FIGHTING CRIME IN CHICAGO

Spark + H2O

OPEN CITY, OPEN DATA“…my kind of town” - F. Sinatra

~4.6 Million rows of crimes from 2001, updated weekly*External data source considerations???

Weather Data ?U.S. CensusData ?

Crime Data

ML WORKFLOW

1. Collect datasets (Crime + Weather + Census)2. Do some feature extraction (e.g. dates, times)3. Join Crime data Weather Data Census Data4. Build deep learning model to predict

arrest / no arrest made

GOAL:For a given crime,

predict if an arrest is more / less likely to be made!

SPARK SQL + H2O RDD3 table join using Spark SQL

Convert joined table to H2O RDD

H2O DEEP LEARNINGCan do grid search over many parameters!

HOW’D WE DO?

nice!

~ 10 mins

MODEL BUILDING + TUNINGDReD Net = Deep Rectifier w/ Dropout Neural Net

Arrest

Inputs

X

X

X

X

Epochs, hidden layers, regularization

UNSUPERVISED LEARNINGDeep Learning Applications on Non-Labeled Data

UNSUPERVISED LEARNINGWhat is it?

Examples of unsupervised learning tasks:

1. Clustering - Discovering customer segments2. Topic Extraction - What topics are people tweeting about?

3. Information Retrieval - IBM Watson: Question + Answer

Methods to understand the general structure of input data whereno predictions is needed.

4. Anomaly Detection - Detecting irregular heart-beats

NO CURATION NEEDED!

UNSUPERVISED ALGORITHMS

Dimensionality Reduction

Anomaly Detection

•  K-means: Partitions observations into k clusters/groups of the same spatial size

•  Principal Component Analysis: Linearly transforms correlated variables to independent components

•  Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

Clustering

AUTOENCODER + H2OInput Output

HiddenFeatures

Information Flow

x1

x2

x3

x4

x1

x2

x3

x4

Dogs, Dogs and Dogs

ANOMALY DETECTION OF VINTAGE YEAR BORDEAUX WINE

BORDEAUX WINELargest wine-growing region in France

+ 700 Million bottles of wine produced / year !

Some years better than others: Great ($$$) vs. Typical ($)Last Great years: 2010, 2009, 2005, 2000

GREAT VS. TYPICAL VINTAGE?Question:

Can we study weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years >>

correlates to Great ($$$) vs. Typical ($) Vintage?

The Bordeaux Dataset (1952 - 2014 Yearly)

Amount of Winter Rain (Oct > Apr of harvest year)Average Summer Temp (Apr > Sept of harvest year)Rain during Harvest (Aug > Sept)Years since last Great Vintage

AUTOENCODER + ANOMALY DETECTION

ML Workflow:

1) Train autoencoder to learn ‘typical’ vintage weather pattern2) Append ‘great’ vintage year weather data to original dataset3) IF great vintage year weather data does NOT match learnedweather pattern, autoencoder will produce high reconstruction

error (MSE)

‘en primeur of en primeur’ - Can we use weather patterns to identify anomalous years >> indicates great vintage quality?

Goal:

RESULTS (MSE > 0.10)

Mean Square Error

1961 V 2009 V

2005 V2000 V

1990 V

1989 V

1982 V2010 V

2014 BORDEAUX??

Mean Square Error

2014 ?2013

DEEP AUTOENCODERS + K-MEANS EXAMPLE

Help cyclists with their health related questions!

CYCLING + __________Problem:

New and Experienced Cyclists have questions about cycling + ______ (given topic). Let’s build a question + answer system to help!

ML Workflow:1) Scrape thousands of article titles from internet about cycling /

cycling tips / cycling health, etc from various sources.

2) Build Bag-of-Words Dataset on article titles corpus

3) Reduce # of dimensions via deep autoencoder

4) Extract ‘last layer’ of deep features and cluster using k-means

5) Inspect Results!

BAG-OF-WORDSBuild dataset of cycling-related articles from various sources:

The Basics of Exercise Nutrition

0 , 0 , 0 , 0 , 1, 1, 0 , 0 , 1, 0 , 0 …, 0

basics exercise nutrition

lower caseremove ‘stopwords’remove punctuation

Article Title

[ ]

DIMENSIONALITY REDUCTION

Use deep autoencoder to reduce # features (~2,700 words!)

2,700 Words

500 hidden features

250 H.F.

125 H.F.

50

125 H.F.

250 H.F.

500 hidden features

2,700 Words

Decoder

Encoder

The Basics of Exercise Nutrition

K-MEANS CLUSTERINGFor each article: Extract ‘last’ layer of autoencoder (50 deep features)

The Basics of Exercise Nutrition 50 ‘deep features’

The Basics of Exercise Nutrition -‐0.09330833 0.167881429 -‐0.234307408 0.247723639 -‐0.067700267 -‐0.094107866

DF1 DF2 DF3 DF4 DF5 DF6

K-Means ClusteringInputs: Extracted 50 deep features for each cycling-related articleK = 50 clusters after grid-search of values

RESULT: CYCLING + A.I.Now we inspect the clusters!

Test Article Title:Fluid & Carbohydrate Ingestion Improve Performance During 1Hour of

Intense Exercise

Result:Clustered w/ 17 other titles (out of ~5,700)

Top 5 similar titles within cluster :

Caffeine ingestion does not alter performance during a 100-km cycling time-trial performance

Immuno-endocrine response to cycling following ingestion of caffeine and carbohydrate

Metabolism and performance following carbohydrate ingestion late in exercise

Increases in cycling performance in response to caffeine ingestion are repeatable

Fluid ingestion does not influence intense 1-h exercise performance in a mild environment

HOW TO GET FASTER?Test Article Title:

Muscle Coordination is Key to Power Output & Mechanical Efficiency of Limb Movements

Result:Clustered w/ 29 other titles (out of ~5,700)

Top 5 similar titles within cluster :Muscle fibre type efficiency and mechanical optima affect freely chosen pedal rate during cycling.

Standard mechanical energy analyses do not correlate with muscle work in cycling.

The influence of body position on leg kinematics and muscle recruitment during cycling.

Influence of repeated sprint training on pulmonary O2 uptake and muscle deoxygenation kinetics in humans

Influence of pedaling rate on muscle mechanical energy in low power recumbent pedaling using forward dynamic simulations

WHAT’S NEXT??Build smarter apps!!

[email protected]/h2oai

Hack with us!!

mailto:[email protected]

http://github.com/h2oai

HIGGS-BOSON PARTICLE

How did our Deep Neural Net do??

BEST Low-Level AUC: 0.73

Alex Tellez, Deep Learning Applications

Software

Transcript of Alex Tellez, Deep Learning Applications