Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio...

19
Open Source Framework for Deploying Data Science Models and Cloud Based Applications Pivotal Data Science Team

Transcript of Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio...

Page 1: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Open Source Framework for Deploying Data Science Models and

Cloud Based Applications

Pivotal Data Science Team

Page 2: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal
Page 3: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

What happened?

What should I do about it? This is where Data Science comes in

What will happen next?

Page 4: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

What Thought Leaders Have In Common Large amounts of structured and

unstructured data Deep personal knowledge of their

audience Quantified understanding of their

products Data-driven culture User experience optimized by data

science

Page 5: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Viewership

Advertisements Merchandise

Sales & Finance

$

Market Research & Competitive Information

Audience Demographics

Internal Data Sources Typical External Sources Semi/Unstructured Data

Clickstream

Social Media

Content

Page 6: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Data Science Impact Business Motivation

Increase Demand

Build Brand Equity

Increase Production Efficiency

Optimize Ad Spend Efficiency

Increase Customer Engagement

• Campaign Optimization

• Marketing Mix Models

Data Science Opportunities

• Customer segmentation

• Affinity analysis

• Social media analytics

• Supply/Demand forecasting

Increase Revenue

Reduce Cost

Page 7: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Example Use Case: Ratings Prediction

Use Case: Increase ratings across viewer demographics How: • Data: Viewership, transcripts and show

data combined in big data platform • Model: Machine learning used to

identify the impact of production decisions on viewership

Insights

Page 8: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Models Insights Actions

Models are built to answer business

questions e.g. what makes viewers tune-

in and tune-out?

Data Scientists interpret models for

answers e.g. On screen arguments

make viewers tune out

Report

Dashboard

BI Tool

Email

Presentation

Cloud App

End User

A good insight drives action that will generate value for stakeholders

Page 9: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Revisiting Rating Prediction Use Case

Model exposed to end users via cloud application allowing what-if scenario building

Page 10: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Characteristics Of Actionable Insights

Real-time

Scalable Social

Relevant

Accessible

Open

Page 11: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Benefits Of Cloud Based Applications

Service failure or data loss at scale

Long innovation cycles

Poor experience at scale

Resilient, scale-out messaging and processing

Agile development with cloud based data services

Low-latency, in-memory computing

Page 12: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Open Source Analytics Ecosystem

Media companies benefit from algorithmic breadth and scalability for building and socializing data science models

MLlib

PL/

X

Algorithms Visualization

Best of breed in-memory and in-database tools for an MPP platform

Page 13: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Example Scalable Open Source Platform

Hadoop++: Complementing the Hadoop platform are Data Science modeling tools. SQL on Hadoop (e.g. HAWQ), Python/R interfaces to SQL, Apache Spark etc.

http://opendataplatform.org/

Apps

Data

Analytics

Leading Media companies are moving towards a platform with Hadoop at the core.

Page 14: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Data Science Pipeline On Hadoop++

MLlib

PL/

X

Data Lake

Hadoop++

Structured + Unstructured

Data

Page 15: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Open Source Framework For Ratings Prediction

Data Lake Insights and

Model Results

Ratings Predictions

Business Levers

Hosted on

What-if Scenario Application Contains structured

+ unstructured data

MLlib

PL/

X

Page 16: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Gather video ads impression stats

Data Lake Ingest

Message Broker Simulate Ad Server

Behavior

Impression Forecasts

Business Levers

Hosted on

Business Metrics Dashboard

Expanding The Framework To Include Impression Forecasting Modeling

MLlib

PL/

X

Page 17: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Measuring Audience Engagement : Workflow

Parallel Parsing of JSON

(PL/Python)

Twitter Decahose (~55 million tweets/day)

Source: http Sink: hdfs

HDFS

External Tables

PXF

Nightly Cron Jobs

Topic Analysis through MADlib

pLDA

Unsupervised Sentiment Analysis

(PL/Python)

Hosted on

Page 18: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

Key Takeaways • Blended data sets lead to richer models and more

valuable insights • Turn Data Science models and insights into value

generating actions through data driven applications. • Open source = power and flexibility • Platform extensibility is key to supporting Data Science • Turnkey PaaS is available through CloudFoundry,

including infrastructure monitoring, server configuration and scalability.

Page 19: Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

THANK YOU!