Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

42
1 © Copyright 2013 EMC Corporation. All rights reserved. Pivotal Data Scientists on the Front Line: Examples of Data Science in Action Getting to Know Your Customer with Big & Fast Data Pivotal Data Science Team

Transcript of Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

Page 1: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

1 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Data Scientists on the Front Line: Examples of Data Science in Action

Getting to Know Your Customer with Big & Fast Data

Pivotal Data Science Team

Page 2: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

2 © Copyright 2013 EMC Corporation. All rights reserved.

Welcome – It’s a Pleasure to Meet You • The Launch of Pivotal

• Pivotal Data Science Team

• Getting to Know Your Customer - Meet your customer: Build Models - Learn more about your customer: More Data - Adapt to your customer: Dynamic Models

• Let’s Get Started: Pivotal Data Science Labs

• Q & A

Page 3: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

A NEW PLATFORM FOR A NEW ERA

Page 4: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

4 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal, The New EMC Spin-out

Private Cloud

Public Cloud

Pivotal is building a new platform for a new era This platform enables customers

to build a new class of applications

That leverage Big and Fast Data

All this with the power of cloud independence

Page 5: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

5 © Copyright 2013 EMC Corporation. All rights reserved.

Introducing the Pivotal Stack

Cloud Storage

Virtualization

Data & Analytics Platform

Cloud Application

Platform

Data-Driven Application

Development

Pivotal Data Science Labs

Page 6: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

6 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Services: Rapid Time to Value

Pivotal Labs: Quickly create and deploy new applications • Proven methodology to

remove risk and accelerate results

Pivotal Data Science Labs: A proven data science practice to accelerate analytics projects • Drive business value

through data analytics

Open Source Support: Collaborative and customer-driven open source support, services and co-development

Page 7: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

7 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Data Science

Page 8: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

8 © Copyright 2013 EMC Corporation. All rights reserved.

Tell Me About Data Science What it is:

– Data preparation – Data exploration and visualization – Feature creation based on data and domain knowledge – Quantitative modeling & model validation – Scoring data

What it is not: – A set of tools – Application development

Page 9: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

9 © Copyright 2013 EMC Corporation. All rights reserved.

Platform-Driven Data Science Paradigm Shift

1. Modeling on more data

2. Rapid ingestion of new data

3. Re-use of valuable data

4. Faster model building

5. Scalable advanced modeling

6. Faster model refreshing

7. Faster data scoring

Page 10: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

10 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Data Science Knowledge Development

Page 11: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

12 © Copyright 2013 EMC Corporation. All rights reserved.

Data Science Strategy

Point Model Development

Multiple Model Development

Transformation to “Predictive Enterprise”

Pivotal Data Science Labs

Page 12: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

13 © Copyright 2013 EMC Corporation. All rights reserved.

Getting to Know Your Customer Deeper Insights With Data Science

Page 13: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

14 © Copyright 2013 EMC Corporation. All rights reserved.

More Data Science Deeper Insights

Meet Your Customer

Learn More About Your Customer

Adapt to Your

Customer

Build Models More Data Dynamic

Models

Page 14: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

15 © Copyright 2013 EMC Corporation. All rights reserved.

The New Normal: “An Audience of One” DATA DEVICES

Media

Banks

Delivery Services

Marketers

Government

Individuals Employers

Data Users/Buyers

Analytic Services

Advertising

Catalog Co-ops

List Brokers

Websites

Information Brokers

Credit Bureaus Media

Archives

Data Aggregators

CONTENT

GOVERNMENT

PHONE/ TV

INTERNET AD

AGENCY

RETAIL

Page 15: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

16 © Copyright 2013 EMC Corporation. All rights reserved.

Targeting & Retention

Social Media Analysis

Campaign optimization

Data-driven Customer Analytics

Transaction History

Purchases

Clickstream

Customer Data

Unified data supporting re-usable predictive models

GB

TB

PB

Data Size

Page 16: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

17 © Copyright 2013 EMC Corporation. All rights reserved.

More Data Science Deeper Insights

Meet Your Customer

Learn More About Your Customer

Adapt to Your

Customer

Build Models More Data Dynamic

Models

Page 17: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

18 © Copyright 2013 EMC Corporation. All rights reserved.

Who Are Our Customers? • One way of learning about

customers is to divide them into characteristic groups

• This is called segmentation

• Let’s take a look at a segmentation exercise Pivotal did with a large medical insurance company…

Page 18: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

19 © Copyright 2013 EMC Corporation. All rights reserved.

What Did We Have to Work With?

Product Sales Claims Data

Provider Information

Consumer Data

Population Served

Page 19: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

20 © Copyright 2013 EMC Corporation. All rights reserved.

Before – Random Clusters After – Cohesive Clusters

So What Did We Do With this Data?

Page 20: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

21 © Copyright 2013 EMC Corporation. All rights reserved.

New Clinics Neighborhood Clinics

Pirate Clinics Established Clinics

What Was the Outcome?

Page 21: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

22 © Copyright 2013 EMC Corporation. All rights reserved.

Churn Models

Micro-segmentation

Marketing Mix Model

Cross-Sell/Up-Sell Optimization

Lifetime Value Calculation

Consumer-Provider

Recommendation Engine

Where to Next?

Segmentation as

Foundational Analytics

Page 22: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

23 © Copyright 2013 EMC Corporation. All rights reserved.

•Improve understanding of customer

Objective:

•Existing EDW sources •New big data sources that capture customer demographics, such as the publicly available US Census

Data:

•Segmentation via k-means clustering

Data Science Methodology:

•Dramatically increase familiarity with makeup and behavior of customer base •Drive targeted marketing efforts •Lay foundation for higher-quality future models

Business Impact & Improvement:

Summary: Get to Know Your Customer by Building Data-Driven Models

Page 23: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

24 © Copyright 2013 EMC Corporation. All rights reserved.

More Data Science Deeper Insights

Meet Your Customer

Learn More About Your Customer

Adapt to Your

Customer

Build Models More Data Dynamic

Models

Page 24: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

25 © Copyright 2013 EMC Corporation. All rights reserved.

Churn Models for Telecom Industry Goal

– Identify and prevent customers who are likely to churn.

Challenges – Cost of acquiring new customers is high – Recouping cost of customer acquisition high if customer is not retained long enough – Lower barrier to switching subscribers – With mobile number portability, barrier to switching even lower

Good News – Cost of retaining existing customers is lower!

Page 25: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

26 © Copyright 2013 EMC Corporation. All rights reserved.

Structured Features for Churn Models The problem is extensively studied with a rich set of approaches in the literature

These features are great, but the models soon hit a plateau with structured features!

Device Texting Stats Call Stats Rate Plans Customer

Demographics

Page 26: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

27 © Copyright 2013 EMC Corporation. All rights reserved.

Blending the Unstructured with the Structured

What other sources of previously untapped data could we use ?

Are our customers happy ? Where ? What segments ? What are the common topics in their conversations online ?

Page 27: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

28 © Copyright 2013 EMC Corporation. All rights reserved.

Sentiment Analysis and Topic Models

Sentiment Analysis Engine

(Classifier)

Structured Data: EDW

BETTER PREDICT LIKELIHOOD TO CHURN

Topic Engine (LDA)

Topic Dashboard

Unstructured Data

External Internal

Page 28: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

29 © Copyright 2013 EMC Corporation. All rights reserved.

Topic Clouds from Twitter - An Example Baby shower & Coupons: 13%

Convenience: 26%

Store experience: 13%

Misc: 32% Promotions, deals: 17%

Page 29: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

30 © Copyright 2013 EMC Corporation. All rights reserved.

•Improve accuracy of churn models by blending structured features with unstructured text

Objective:

•Existing structured features (call data records, device type, rate plans etc.) •Call center memos

Data:

•Sentiment Analysis and Topic Modeling

Data Science Methodology:

•Achieved 16% improvement in ROC curve for Churn prediction •Topic Models automatically identified common themes in call center memos •Laid foundation for Text Analytics

Business Impact & Improvement:

Summary: More Data to Drive Additional Customer Insights

Page 30: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

31 © Copyright 2013 EMC Corporation. All rights reserved.

More Data Science Deeper Insights

Meet Your Customer

Learn More About Your Customer

Adapt to Your

Customer

Build Models More Data Dynamic

Models

Page 31: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

32 © Copyright 2013 EMC Corporation. All rights reserved.

State of Data at Telco Company Customer Segments

New Data Sources

Multi-Gadget Families Affluent Matures

Thrifty Families High Tech Singles

Budget Singles Seniors

Internet Deep Packet Inspection TV Consumption (Linear)

Video On Demand Consumption

Page 33: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

34 © Copyright 2013 EMC Corporation. All rights reserved.

Newly Identified Behavior-Based Segments

Sub

scri

bers

Moderates

OTT & Data Heavyweights

Portable OTT Entertainment Seekers

iPhone Heavy

Android Heavy

iPad Heavy

In-Home OTT Entertainment Seekers

In-Home Native Content Seekers

VOD Heavy

TV Heavy

Page 34: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

35 © Copyright 2013 EMC Corporation. All rights reserved.

Going Further: Crossing Behavior-Based Segments on Existing Customer Segments

Moderates

OTT & Data Heavyweights

In-Home OTT Entertainment Seekers

Portable OTT Entertainment Seekers - iPhone Heavy

Portable OTT Entertainment Seekers - Android Heavy

Portable OTT Entertainment Seekers - iPad Heavy

In-Home Native Content Seekers - VOD Heavy

In-Home Native Content Seekers - TV Heavy

Existing Segments

Newly Discovered Usage-Based Segments

Multi-Gadget Families

Affluent Matures

Thrifty Families

Budget Singles

Seniors

High Tech Singles

Customized Micro-Segments!

Page 35: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

36 © Copyright 2013 EMC Corporation. All rights reserved.

Driving New Business Value by Leveraging Data Science

Upsell and Cross-Sell New Product Offerings Data Monetization

Page 36: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

37 © Copyright 2013 EMC Corporation. All rights reserved.

•Combine existing models with new models derived from big data sources

Objective:

•Existing EDW sources •New big data sources that capture subscriber behavior, including machine generated sources such as DPI & VOD set-top box data

Data:

•Micro-segmentation via clustering

Data Science Methodology:

•Reduce operational and financial dependence on survey data •Lay foundation for data monetization •Generate tailored upsell & cross-sell opportunities •Real, customer behavior driven guidance for product & app development

Business Impact & Improvement:

Summary: Adapt to Your Customer with More Data Science

Page 37: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

38 © Copyright 2013 EMC Corporation. All rights reserved.

Let’s Get Started Transforming Your Business with Data Science

Page 38: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

39 © Copyright 2013 EMC Corporation. All rights reserved.

Process in New World Order

Page 39: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

40 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Data Science Labs: Packaged Services

LAB PRIMER (2-Week Roadmapping)

• Analytics Roadmap

• Prioritized Opportunities

• Architectural Recommendations

LAB 600 (6-Week Lab)

• Prof. services

• Data science model building

• Ready-to-deploy model(s)

LAB 1200 (12-Week Lab)

• Prof. services

• Data science model building

• Ready-to-deploy model(s)

LAB 100 (Analytics Bundle)

• On-site MPP Analytics Training

• Analytics tool-kit

• Quick insight (2 weeks)

*Pivotal platform priced separately

Page 40: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

41 © Copyright 2013 EMC Corporation. All rights reserved.

Thank You Do you have any questions?

Page 41: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

42 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Sessions at EMC World Session Presenter Dates/Times The Pivotal Platform: A Purpose-Built Platform for Big-Data-Driven Applications

Josh Klahr Tue 5:30 - 6:30, Palazzo E Wed 11:30 - 12:30, Delfino 4005

Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action

Noelle Sio Tue 10:00 - 11:00, Lando 4205 Thu 8:30 - 9:30, Palazzo F

Pivotal: Operationalizing 1000-node Hadoop Cluster – Analytics Workbench

Clinton Ooi Bhavin Modi

Tue 11:30 - 12:30, Palazzo L Thu 10:00- 11:00 am, Delfino 4001A

Pivotal: for Powerful Processing of Unstructured Data For Valuable Insights

SK Krishnamurthy

Mon 4:00 - 5:00, Lando 4201 A Tue 4:00 - 5:00, Palazzo M

Pivotal: Big & Fast data – merging real-time data and deep analytics

Michael Crutcher

Mon 1:00 - 2:00, Lando 4201 A Wed 10:00 - 11:00, Palazzo M

Pivotal: Virtualize Big Data to Make The Elephant Dance June Yang Dan Baskette

Mon 11:30 - 12:30, Marcello 4401A Wed 4:00 - 5:00, Palazzo E

Hadoop Design Patterns Don Miner Mon 2:30 - 3:30, Palazzo F Wed 8:30 - 9:30, Delfino 4005

Page 42: Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action