DataCanvas: Big Data Analytic Flow in Cloud

Post on 27-Jun-2015

188 views 1 download

Tags:

description

A PPT explains what is DataCanvas. DataCanvas is a cloud service that allows business to create, manage and share big data analytic pipelines.

Transcript of DataCanvas: Big Data Analytic Flow in Cloud

BIG DATA ANALYTIC FLOW IN CLOUD

Lei Fang@DataCanvas.io

Empower big data analytics for business

• 16.9B USD in 2015

• 40% Big data project

• Hadoop, CAGR 58%,

2.2B 2020

• Volume

• Velocity

• Variety

Super hot in

• Government

• Communication

• Media

• Banking

• Manufacturing

Technology

InfrastructureIAAS, SAAS, DAAS,

ApplicationBI, Social analytics,

visualization…

Domain solutionFinance, Retail,

Insurance

DevelopmentData scientist,

Devops

Business process

Operation, Support

ANALYTICS IS THE

Make data live Data sitting in storage generates no value

Revenue and profit from data Application and solution to get insights from data Link insights with business Don’t stop at visualization or report

Advanced analytics is the engine of business solution Fraud detection Customer retention

COMMON ANALYTICS SCENARIOS Data analysis

Example: Estimate customer’s life cycle value User: data scientist Demanding: flexibility to explore and faster iteration

Product analysis Example: How many female customers visit website home

page and leave within less than 5 clicks? User: product manager, data analyst, marketing team Demanding: No complex coding, SQL query at most

Predictive service Example: Is this transaction a fraud? User: developer and data scientist Demanding: pipeline processing

WHAT DOES DATACANVAS ADDRESS Powering all these scenarios

Data Analysis: Flexible Product Analysis: Intuitive Prediction service: Complex processing

Enable application, solution and business process

DataCanvas

Hadoop(HIVE/Pig) RDBMS NOSQL SPARK

Recommendation Anomaly Detection Operation Analytics

Application

Platform to enable application and connect infrastructure

Service

Pipeline

Infrastructure

• Big data challenges are across services, environments and even locations

Storage

Processing

Reporting

Data Generation

• An orchestration platform is required to manage and connect steps in the pipeline

• Bring Pipeline to the game

No more central data store, bring computation to data, not vice versa!

• Unify resource

• Optimize workload

• Automation

Unmanageable

Redundancy

Hard to fast iterate

Gap between documentation and actual workflow

Pain points

monster configuration

spaghetti script no reuse No idea what’s actually running

WHAT IS DATACANVAS

• Drag & drop to run data flow• Public or private cloud• Intuitive job management

• Module repository• Built-in library• Make your own recipe• Powering advanced analytics

• Business solution template• Address common applications• Fully customizable

• Team collaboration • Flow sharing • Module sharing• This is the BEST documentation

VALUE

WorkflowScheduling

Module Solution Template

Operation Developer/Data scientist

Business

• Data ETL• Machine learning • Module repository

• Business requirement• Recommendation • Fraud detection • Sentiments analysis

• User experience• Production

quality• Easy ops

WHY CONTAINER MATTERS• Seamlessly connect to any existing/

upcoming computation infrastructure

• Enabler for module management

and sharing

• Support Lambda: Processing +

Serving + Visualization

Lambda Architecture

COMPETITORSAWS DP

Oozie AzureML MortarData

Azkaban DataCanvas

Workflow + Scheduling

Module management

Solution template

Multiple Env support

Collaboration + Sharing

Cloud service

DataCanvas = ((Workflow + Scheduler) * Drag & drop * Module composition ) ^ Solution @ Cloud

Good

Bad or not support

Not that great

BUSINESS MODEL Subscription

Charge services on tiers, Startup, Premium, Enterprise

Free

• 1 user• Unlimited

projects• Limited

workload, good for evaluation

• Forum support

Startup

• Unlimited users• Unlimited

projects• Decent

workload, 3-5 jobs in parallel

• Email support

Premium

• Unlimited users• Unlimited

projects• Significant

workload, >20 jobs in parallel

• Email support

Enterprise

• Unlimited users• Unlimited

projects• Workload on

scale• Full support

Annual Support Package For Premier and Enterprise customers Forum support, Email support with SLA, Telephone support

TARGET CUSTOMER Data scientist

Assembly line to facilitate exploration Team collaboration

Analyst Drag and drop to find insights, need any more reason?

Manager Faster iteration Shorter time to deliver project Easier to maintain

WHERE ARE WE NOWDemo upon request (contact@zetdata.com)

DataCanvasIO @ GitHub

THANK YOU