Pivotal Data Highlights on Myeloproliferative Neoplasms Myelofibrosis
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
-
Upload
emc -
Category
Technology
-
view
1.903 -
download
1
Transcript of Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
1 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Data Scientists on the Front Line: Examples of Data Science in Action
Getting to Know Your Customer with Big & Fast Data
Pivotal Data Science Team
2 © Copyright 2013 EMC Corporation. All rights reserved.
Welcome – It’s a Pleasure to Meet You • The Launch of Pivotal
• Pivotal Data Science Team
• Getting to Know Your Customer - Meet your customer: Build Models - Learn more about your customer: More Data - Adapt to your customer: Dynamic Models
• Let’s Get Started: Pivotal Data Science Labs
• Q & A
A NEW PLATFORM FOR A NEW ERA
4 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal, The New EMC Spin-out
Private Cloud
Public Cloud
Pivotal is building a new platform for a new era This platform enables customers
to build a new class of applications
That leverage Big and Fast Data
All this with the power of cloud independence
5 © Copyright 2013 EMC Corporation. All rights reserved.
Introducing the Pivotal Stack
Cloud Storage
Virtualization
Data & Analytics Platform
Cloud Application
Platform
Data-Driven Application
Development
Pivotal Data Science Labs
6 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Services: Rapid Time to Value
Pivotal Labs: Quickly create and deploy new applications • Proven methodology to
remove risk and accelerate results
Pivotal Data Science Labs: A proven data science practice to accelerate analytics projects • Drive business value
through data analytics
Open Source Support: Collaborative and customer-driven open source support, services and co-development
7 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Data Science
8 © Copyright 2013 EMC Corporation. All rights reserved.
Tell Me About Data Science What it is:
– Data preparation – Data exploration and visualization – Feature creation based on data and domain knowledge – Quantitative modeling & model validation – Scoring data
What it is not: – A set of tools – Application development
9 © Copyright 2013 EMC Corporation. All rights reserved.
Platform-Driven Data Science Paradigm Shift
1. Modeling on more data
2. Rapid ingestion of new data
3. Re-use of valuable data
4. Faster model building
5. Scalable advanced modeling
6. Faster model refreshing
7. Faster data scoring
10 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Data Science Knowledge Development
12 © Copyright 2013 EMC Corporation. All rights reserved.
Data Science Strategy
Point Model Development
Multiple Model Development
Transformation to “Predictive Enterprise”
Pivotal Data Science Labs
13 © Copyright 2013 EMC Corporation. All rights reserved.
Getting to Know Your Customer Deeper Insights With Data Science
14 © Copyright 2013 EMC Corporation. All rights reserved.
More Data Science Deeper Insights
Meet Your Customer
Learn More About Your Customer
Adapt to Your
Customer
Build Models More Data Dynamic
Models
15 © Copyright 2013 EMC Corporation. All rights reserved.
The New Normal: “An Audience of One” DATA DEVICES
Media
Banks
Delivery Services
Marketers
Government
Individuals Employers
Data Users/Buyers
Analytic Services
Advertising
Catalog Co-ops
List Brokers
Websites
Information Brokers
Credit Bureaus Media
Archives
Data Aggregators
CONTENT
GOVERNMENT
PHONE/ TV
INTERNET AD
AGENCY
RETAIL
16 © Copyright 2013 EMC Corporation. All rights reserved.
Targeting & Retention
Social Media Analysis
Campaign optimization
Data-driven Customer Analytics
Transaction History
Purchases
Clickstream
Customer Data
Unified data supporting re-usable predictive models
GB
TB
PB
Data Size
17 © Copyright 2013 EMC Corporation. All rights reserved.
More Data Science Deeper Insights
Meet Your Customer
Learn More About Your Customer
Adapt to Your
Customer
Build Models More Data Dynamic
Models
18 © Copyright 2013 EMC Corporation. All rights reserved.
Who Are Our Customers? • One way of learning about
customers is to divide them into characteristic groups
• This is called segmentation
• Let’s take a look at a segmentation exercise Pivotal did with a large medical insurance company…
19 © Copyright 2013 EMC Corporation. All rights reserved.
What Did We Have to Work With?
Product Sales Claims Data
Provider Information
Consumer Data
Population Served
20 © Copyright 2013 EMC Corporation. All rights reserved.
Before – Random Clusters After – Cohesive Clusters
So What Did We Do With this Data?
21 © Copyright 2013 EMC Corporation. All rights reserved.
New Clinics Neighborhood Clinics
Pirate Clinics Established Clinics
What Was the Outcome?
22 © Copyright 2013 EMC Corporation. All rights reserved.
Churn Models
Micro-segmentation
Marketing Mix Model
Cross-Sell/Up-Sell Optimization
Lifetime Value Calculation
Consumer-Provider
Recommendation Engine
Where to Next?
Segmentation as
Foundational Analytics
23 © Copyright 2013 EMC Corporation. All rights reserved.
•Improve understanding of customer
Objective:
•Existing EDW sources •New big data sources that capture customer demographics, such as the publicly available US Census
Data:
•Segmentation via k-means clustering
Data Science Methodology:
•Dramatically increase familiarity with makeup and behavior of customer base •Drive targeted marketing efforts •Lay foundation for higher-quality future models
Business Impact & Improvement:
Summary: Get to Know Your Customer by Building Data-Driven Models
24 © Copyright 2013 EMC Corporation. All rights reserved.
More Data Science Deeper Insights
Meet Your Customer
Learn More About Your Customer
Adapt to Your
Customer
Build Models More Data Dynamic
Models
25 © Copyright 2013 EMC Corporation. All rights reserved.
Churn Models for Telecom Industry Goal
– Identify and prevent customers who are likely to churn.
Challenges – Cost of acquiring new customers is high – Recouping cost of customer acquisition high if customer is not retained long enough – Lower barrier to switching subscribers – With mobile number portability, barrier to switching even lower
Good News – Cost of retaining existing customers is lower!
26 © Copyright 2013 EMC Corporation. All rights reserved.
Structured Features for Churn Models The problem is extensively studied with a rich set of approaches in the literature
These features are great, but the models soon hit a plateau with structured features!
Device Texting Stats Call Stats Rate Plans Customer
Demographics
27 © Copyright 2013 EMC Corporation. All rights reserved.
Blending the Unstructured with the Structured
What other sources of previously untapped data could we use ?
Are our customers happy ? Where ? What segments ? What are the common topics in their conversations online ?
28 © Copyright 2013 EMC Corporation. All rights reserved.
Sentiment Analysis and Topic Models
Sentiment Analysis Engine
(Classifier)
Structured Data: EDW
BETTER PREDICT LIKELIHOOD TO CHURN
Topic Engine (LDA)
Topic Dashboard
Unstructured Data
External Internal
29 © Copyright 2013 EMC Corporation. All rights reserved.
Topic Clouds from Twitter - An Example Baby shower & Coupons: 13%
Convenience: 26%
Store experience: 13%
Misc: 32% Promotions, deals: 17%
30 © Copyright 2013 EMC Corporation. All rights reserved.
•Improve accuracy of churn models by blending structured features with unstructured text
Objective:
•Existing structured features (call data records, device type, rate plans etc.) •Call center memos
Data:
•Sentiment Analysis and Topic Modeling
Data Science Methodology:
•Achieved 16% improvement in ROC curve for Churn prediction •Topic Models automatically identified common themes in call center memos •Laid foundation for Text Analytics
Business Impact & Improvement:
Summary: More Data to Drive Additional Customer Insights
31 © Copyright 2013 EMC Corporation. All rights reserved.
More Data Science Deeper Insights
Meet Your Customer
Learn More About Your Customer
Adapt to Your
Customer
Build Models More Data Dynamic
Models
32 © Copyright 2013 EMC Corporation. All rights reserved.
State of Data at Telco Company Customer Segments
New Data Sources
Multi-Gadget Families Affluent Matures
Thrifty Families High Tech Singles
Budget Singles Seniors
Internet Deep Packet Inspection TV Consumption (Linear)
Video On Demand Consumption
33 © Copyright 2013 EMC Corporation. All rights reserved.
Understanding Subscriber Behavior Native Services
Video On Demand TV Internet
Internet Devices
OTT Services
What is the level of engagement with Client’s products (TV, VOD, Internet)?
What are the patterns of device usage behavior?
What is the level of OTT engagement, by segment, and by bandwidth?
34 © Copyright 2013 EMC Corporation. All rights reserved.
Newly Identified Behavior-Based Segments
Sub
scri
bers
Moderates
OTT & Data Heavyweights
Portable OTT Entertainment Seekers
iPhone Heavy
Android Heavy
iPad Heavy
In-Home OTT Entertainment Seekers
In-Home Native Content Seekers
VOD Heavy
TV Heavy
35 © Copyright 2013 EMC Corporation. All rights reserved.
Going Further: Crossing Behavior-Based Segments on Existing Customer Segments
Moderates
OTT & Data Heavyweights
In-Home OTT Entertainment Seekers
Portable OTT Entertainment Seekers - iPhone Heavy
Portable OTT Entertainment Seekers - Android Heavy
Portable OTT Entertainment Seekers - iPad Heavy
In-Home Native Content Seekers - VOD Heavy
In-Home Native Content Seekers - TV Heavy
Existing Segments
Newly Discovered Usage-Based Segments
Multi-Gadget Families
Affluent Matures
Thrifty Families
Budget Singles
Seniors
High Tech Singles
Customized Micro-Segments!
36 © Copyright 2013 EMC Corporation. All rights reserved.
Driving New Business Value by Leveraging Data Science
Upsell and Cross-Sell New Product Offerings Data Monetization
37 © Copyright 2013 EMC Corporation. All rights reserved.
•Combine existing models with new models derived from big data sources
Objective:
•Existing EDW sources •New big data sources that capture subscriber behavior, including machine generated sources such as DPI & VOD set-top box data
Data:
•Micro-segmentation via clustering
Data Science Methodology:
•Reduce operational and financial dependence on survey data •Lay foundation for data monetization •Generate tailored upsell & cross-sell opportunities •Real, customer behavior driven guidance for product & app development
Business Impact & Improvement:
Summary: Adapt to Your Customer with More Data Science
38 © Copyright 2013 EMC Corporation. All rights reserved.
Let’s Get Started Transforming Your Business with Data Science
39 © Copyright 2013 EMC Corporation. All rights reserved.
Process in New World Order
40 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Data Science Labs: Packaged Services
LAB PRIMER (2-Week Roadmapping)
• Analytics Roadmap
• Prioritized Opportunities
• Architectural Recommendations
LAB 600 (6-Week Lab)
• Prof. services
• Data science model building
• Ready-to-deploy model(s)
LAB 1200 (12-Week Lab)
• Prof. services
• Data science model building
• Ready-to-deploy model(s)
LAB 100 (Analytics Bundle)
• On-site MPP Analytics Training
• Analytics tool-kit
• Quick insight (2 weeks)
*Pivotal platform priced separately
41 © Copyright 2013 EMC Corporation. All rights reserved.
Thank You Do you have any questions?
42 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Sessions at EMC World Session Presenter Dates/Times The Pivotal Platform: A Purpose-Built Platform for Big-Data-Driven Applications
Josh Klahr Tue 5:30 - 6:30, Palazzo E Wed 11:30 - 12:30, Delfino 4005
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Noelle Sio Tue 10:00 - 11:00, Lando 4205 Thu 8:30 - 9:30, Palazzo F
Pivotal: Operationalizing 1000-node Hadoop Cluster – Analytics Workbench
Clinton Ooi Bhavin Modi
Tue 11:30 - 12:30, Palazzo L Thu 10:00- 11:00 am, Delfino 4001A
Pivotal: for Powerful Processing of Unstructured Data For Valuable Insights
SK Krishnamurthy
Mon 4:00 - 5:00, Lando 4201 A Tue 4:00 - 5:00, Palazzo M
Pivotal: Big & Fast data – merging real-time data and deep analytics
Michael Crutcher
Mon 1:00 - 2:00, Lando 4201 A Wed 10:00 - 11:00, Palazzo M
Pivotal: Virtualize Big Data to Make The Elephant Dance June Yang Dan Baskette
Mon 11:30 - 12:30, Marcello 4401A Wed 4:00 - 5:00, Palazzo E
Hadoop Design Patterns Don Miner Mon 2:30 - 3:30, Palazzo F Wed 8:30 - 9:30, Delfino 4005