Gl conference2014 toolkits_alice

26
Machine Learning Toolkits in GraphLab Create Alice Zheng GraphLab, Inc.

description

GraphLab's Alice Zheng presents on using the toolkits within GraphLab Create to build data products.

Transcript of Gl conference2014 toolkits_alice

Page 1: Gl conference2014 toolkits_alice

Machine Learning Toolkits in GraphLab Create Alice Zheng GraphLab, Inc.

Page 2: Gl conference2014 toolkits_alice

Going Beyond Data Engineering

GraphLab Create enables Data Intelligence •  Recommender systems for retailers •  Fraud detection for financial institutions •  Market segmentation and ad targeting •  Churn prediction for telecom •  Community detection and friend

recommendation for social networks

©  2014  GraphLab,  Inc.  

Page 3: Gl conference2014 toolkits_alice

The Data Pipeline

Raw Data

Features

Models

Data Engineering

Data Intelligence

Predictions

Page 4: Gl conference2014 toolkits_alice

GraphLab Create Design Principles

•  Easy to use •  Powerful •  Fast •  Composable

Page 5: Gl conference2014 toolkits_alice

Example: Movie Recommender

City of God

Wild Strawberries

The Celebration

Women on the Verge of a Nervous Breakdown

What do I recommend???

Page 6: Gl conference2014 toolkits_alice

Example: Movie Recommender

City of God

Wild Strawberries

The Celebration

La Dolce Vita

Women on the Verge of a Nervous Breakdown

Page 7: Gl conference2014 toolkits_alice

User-Movie Interaction Matrix Women  on  the  Verge  …  

The  Celebra2on  

City  of  God   Wild  Strawberries  

La  Dolce  Vita  

Bob  

Anna  

David  

Ethan  

Page 8: Gl conference2014 toolkits_alice

Matrix Factorization User-item interactions

Information about users Information about items

Item latent factors User latent factors

×

+ +

Page 9: Gl conference2014 toolkits_alice

Demo

Page 10: Gl conference2014 toolkits_alice

The Moral of the Story

•  Data scientists need the right tools for the right job

•  There is always a more clever model •  There is probably some bug in your data •  GraphLab Create •  Versatile, composable, automated •  Play, learn, build better models

Page 11: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural networks/

deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 12: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural

networks/deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 13: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural

networks/deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 14: Gl conference2014 toolkits_alice

Come to Training Day!

•  GraphLab data science training day tomorrow!

•  A full day of lectures and exercises •  Data engineering, model building,

deployment, all on GraphLab Create

Page 15: Gl conference2014 toolkits_alice

Speed + Scale

•  How much do you need? •  How much data do you really have?

Page 16: Gl conference2014 toolkits_alice

Data Funnel

Raw Data

Features Models

PB GB—TB

MB

Page 17: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Page 18: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 19: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 20: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 21: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL

Page 22: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 23: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 24: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 25: Gl conference2014 toolkits_alice

Benchmarks

0   200   400   600   800   1000   1200   1400   1600   1800  

Run Time of Item Similarity on Netflix Dataset

GraphLab Create (1 Node), 3.6 minutes

Mahout (5 Node), 29 minutes

Page 26: Gl conference2014 toolkits_alice

Become a GLC User!

•  We push the frontier of the industry •  ... and our customers guide us •  Our features are customer driven •  Tell us what you think!