Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014

Unsupervised Learning in H20

H20 World 2014A.Tellez, S.Subramanian, L.Tashkevych, T.Nguyen

What is Unsupervised Learning

Unsupervised Learning: Generalizing the internal structure of the data where no prediction is necessary.

Supervised learning must stand on curated data whereas unsupervised learning requires no ‘answer book’.

Common Unsupervised Learning Approaches:

Clustering: k-means, mixture-models, affinity propagation

Dimensionality Reduction: PCA, Autoencoders

Hidden Markov Models

Topic Extraction: NMF, LDA

Example: Single Malt Scotch

Single Malt Scotch: A whiskey made at one particular distillery from a mash that only uses malted grain (barley).

Must be aged at least 3 years in oak casks

Many famous distilleries produced in northern regions of Scotland

Single Malt Dataset

The Single Malt Whiskey Dataset

85 distilleries from northern Scotland

12 descriptor features

E.g. Sweetness, Smoky, Tobacco, Honey, Spicy, Malty, etc

Each descriptor rated 0 (weak) 4 (strong)

Dataset kindly provided here*.

How can we use our knowledge of unsupervised learning to learn more about single malt whiskeys?

Can build a whiskey recommendation engine based on whiskeys we like already?

* Dataset Source: https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html

Dimension Reduction + K-Means

First, let’s reduce the 12 features to a lower dimensional space using Principal Component Analysis…

…7 principal components explain 85% of the variance in the dataset

Then, let’s use k-means clustering to determine how the unique groups using the new PCA’d dataset

Grid Search shows that 11 clusters are appropriate

Pipe out result and attach original distillery labels to see what whiskey’s cluster with each other all using H20!

Model Results

I ENJOY:

OTHER WHISKEYS THAT CLUSTER WITH THESE:

Model Results Cont’d.

SOME OF YOU MY LIKE:

OTHER WHISKEYS IN THE SAME CLUSTER:

Example: Feeling like ramen?

Burning question: You like Japanese ramen, where can you go for dinner tonight if you want ramen around Mountain View?

Ramen Yelp Dataset

Harvested all the known ramen shops around Mountain View and built our Yelp dataset:

Step 1: PCA

85% of cumulative variance in dataset explained using 2 PC’s

Second PC

Step 2: K-Means

Grid Search shows 4 clusters on PCA’d dataset

I really like this ramen joint:

I’m thinking these places for dinner tonight:

Example: Bordeaux Wines

Bordeaux is the largest wine growing region in France

700 Million bottles of wine (red + white) annually

Some years better than other years Great ($$$) vs. Typical ($)

Last Great Years: 2010, 2009, 2005, 2000

Buying Bordeaux ‘en primeur’

While wine is still barreled, purchasers can ‘invest’ in the wine before bottling and official public release

Advantage: Wines may be considerably cheaper during ‘en primeur’ period than official release.

Great Years: 2000,’05,’09’

Red Obsession Trailer

Sri, there is a 3 minute movie trailer for red obsession that I will show but didn’t send due to size limitation in email.

Great Vintage vs. Typical Vintage

Question: Can we study the weather patterns in Bordeaux leading up to harvest to identify ‘anomalous’ weather years correlates to Great Vintage vs. Typical Vintage?

The Bordeaux Dataset (1952 – 2014) : Yearly data that measures:

Winter Rain (October March of harvest year)

Average Summer Temp (April September of harvest year)

Harvest Rain (August September of harvest year)

Autoencoder + Anomaly Detection

In Steps:

Train an autoencoder model to learn Typical Vintage year weather patterns

Append Great Vintage year weather data to original dataset.

IF Great Vintage year weather data does not match learned weather pattern, autoencoder will produce high reconstruction error (MSE).

‘en primeur’ of ‘en primeur’: Can we use weather patterns to identify anomalous years which may be indicative of Great Vintage quality?

Autoencoder Results (MSE > 0.25)

1961 V

1989 V

1990 V2000 V

2003 NV*

2005 NV

2009 V2010 V

2011 NV*

2014 Bordeaux? You Decide!

2014 ?? 2013 NV

Thank You!

What single malt whiskeys do you like?

Our github has link to original whiskey dataset and the PCA + K-Means cluster assignments

Add to your Netflix: Red Obsession (2013), Somm (2012)

github.com/LenaTash/RH_MachineLearning

All work done in presentation using H20 (Thanks Sri!)

Questions + Comments?

Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014

Data & Analytics

Transcript of Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks at H2OWorld2014

Central Metabolism Cofactor Biosynthesis · ppp9 pi h h2o ppi h h2o h2o dad-5 h[p] atp adp h pi h2o succoa lipoate atp glx 2p4c2me xu5p-D h2o cbl1 ppi h[e] h2o h dad-5 gthrd asp-L

Tellez Suricon Hunting Botnets Advanced Security Analytics · Suricata & Splunk Advanced Security Analytics Anthony Tellez, CISSP, CEH, CNDA ... UBA detects Advanced Cyberattacks

Selene V. Tellez - Design Portfolio

H2O World - H2O Rains with Databricks Cloud

Manuel Salto-Tellez, MD (LMS), FRCPath, FRCPI …Bin...Manuel Salto-Tellez, MD (LMS), FRCPath, FRCPI Professor and Chair of Molecular Pathology Clinical Director, Molecular Diagnostics

TANRECTA SPEAKS JULY - 2017 TANRECTA SPEAKS JULY - 2017 · tanrecta speaks july - 2017 tanrecta speaks july - 2017. tanrecta speaks july - 2017 tanrecta speaks july - 2017 2 3 ...

Anubis Speaks! HADES Speaks!

Week 1 H2O Properties, Solutes Interactions & Types of H2O

h2o - Amazon Web Servicesh2o-release.s3.amazonaws.com/h2o/rel-tukey/6/docs-website/h2o-r/h2... · h2o.week ... ModelAccessors . . . . . . . . . . . . . . . . . . . . . . . . . . .

Viviana Muñoz Tellez Programme Officer, South Centre

Meadow Lakes New Map 2018 · Bluebell Field Poppy Field Foxglove Field Buttercup Field Primrose Field Meadow View H2O Orchid Field Violet Field ENTRANCE H2O H2O H2O H2O (No Camping)

Tellez Diaz Luis_Glosario Oral.pptx

TANRECTA SPEAKS APRIL - 2018 TANRECTA SPEAKS APRIL - 2018 - 2018.pdf · tanrecta speaks april - 2018 tanrecta speaks april - 2018. tanrecta speaks april - 2018 tanrecta speaks april

PHOTOSYNTHESIS · Photosynthesis in purple non-sulfur bacteria CO2 + 2CH3CHOHCH3 C(H2O) + 2CH3COCH3+ H2O isopropanol acetone CO2 + 2 CH3CHOHCOOH C(H2O) + 2 CH3COHCOOH + H2O lactate

Donner Tellez Mbanking Use

Persistence Mechanism - Arquimedes Rene Leyva Tellez

ACO Tesis Tellez Pais

Portafolio sem a informatica iii diego alexis jimenez tellez

Danielle Tellez Undergraduate Architecture Portfolio

Perez Tellez Dis