Beauty and Big Data

Post on 27-Aug-2014

621 views 4 download

Tags:

description

 

Transcript of Beauty and Big Data

Beauty and Big Data [Made possible by H2O and Tableau]

Amy Wang

“A data scientist knows more statistics than a computer scientist and more computer science than a statistician.”

What is H2O?Open source in-memory prediction engineMath Platform

• Parallelized and distributed algorithms making the most use out of multithreaded systems

• GLM, Random Forest, GBM, PCA, etc

Easy to use and adoptAPI• Written in Java – perfect for Java Programmers• REST API (JSON) – drives H2O from R, python, excel

More data? Or better models? Both?Big Data• Use all of your data – model without down sampling• Run a simple GLM or a more complex GBM to find the best fit for the data• More Data + Better Models = Better Predictions

SQLHDFS NoSQLS3

RJSON

H2O

Scala

Java

Intelligent Enterprise Applications

Prediction Engine

Memory Manager

ensemblesSolvers

Deep learningCluster

Classify

Regression

Trees

Forest

Boosting

Gradients

Processes

Nano Fast Scoring Engine

Columnar Compression

Query Processor R-engine

In-Mem Map Reduce

2M Row ingest/ sec

50M Row Regression / sec

750M Row Aggregates / sec

On PremiseOn / Off HadoopOn EC2

Python

Installation Process

Start playing with H2O with R yourself!Grab H2O and our R package: • Download from website : 0xdata.com/downloads• Build from git: https://github.com/0xdata/h2oGet support at: • http://docs.0xdata.com/

Demo: Big Data Workflow using R with H2O

OSEMN

INterpret [in Tableau]

Model [in H2O] and Explore the different models

Explore [in R or Tableau]

Obtain and Scrub

H2O

Data

REST API

Local Socket Server

Demo: Big Data Modeling Visualization in Tableau through R with H2O

A little about us

AdvisorsSystems, Data, File Systems and Hadoop

Scientific Advisory Council

Investors

Doug LeaACM Fellow, Malloc for C, fork-join, java memory model, suny Oswego

Chris PouliotVP of Data Science, Lyft, formerly, Netflix, Google

Dhruba BorthakurHDFS, Hive, Facebook

Stephen BoydProfessor of EE Engineering, Stanford

Rob TibshiraniProfessor of Health Research and Policy, and Statistics, Stanford

Trevor HastieProfessor of Statistics, Stanford

Jishnu BhattacharjeeNexus Venture Partners

Anand Babu PeriasamyFounder, Gluster (RedHat)

Anand RajaramanFounder, Junglee (Amazon) Kosmix (WalmartLabs)

Dipchand “Deep” NisharSVP of Products & UX (LinkedIn)

We’ve Got the Who’s Who of Predictive Analytics