Asking the Right Questions - Solace report on leadership skills
Asking the Right Questions of Your Data
-
Upload
hadoopsummit -
Category
Technology
-
view
738 -
download
2
description
Transcript of Asking the Right Questions of Your Data
![Page 1: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/1.jpg)
Copyright © Think Big Analytics and Neustar Inc.1
Asking the Right Questions of your Data
Mike PetersonVP of Platforms and Data Architecture, Neustar
Jun 26, 2013
![Page 2: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/2.jpg)
2 Copyright © Neustar Inc.
![Page 3: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/3.jpg)
We have come a long way!!!
3
But where/when is the GOLD?Unintended Consequence of Big DataWe need to ask the right QuestionsOh, and lets remember religionand not forget GOVERNANCE
Copyright © Neustar Inc.
![Page 4: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/4.jpg)
Big Data Evolution Status
4
» New data platform is built – 3Tier » Collected many Pbs of data» Hadoop infrastructure in place for 2yrs » Established Data Science teams» Machine Learning is in place » Increased technology skills» Focused data teams» Active in the community
Copyright © Neustar Inc.
![Page 5: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/5.jpg)
Our Partners are still a part of our process
5 Copyright © Think Big Analytics and Neustar Inc.
» Expertise in Technologies» Trusted partner» Collaborative Teams
» Open source leader» Invested in client success» Price/performance
![Page 6: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/6.jpg)
Some Unintended Consequences
6
» More Customer Reporting Request» Because we suddenly have lots of customer
data available» Meaning more work for the DW team!!!
» DR Site is more required than ever» More data, means more critical data to protect» Network Stress to support DR and other additional
access
» Data Governance is overwhelmed with request» Retention Policies need to be re-thought
Copyright © Neustar Inc.
![Page 7: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/7.jpg)
Questions
7
» Customer Driven Questions» Easy to understand
» Subject Questions» Discover the pivot and you have a good start
» Exploratory Questions» Thinking of the unformed questions» Working from the top down» Narrowing the answer before you test all the data
Copyright © Neustar Inc.
![Page 8: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/8.jpg)
Questions - Approaches
• Understand what manual process you want to automate: what is currently manually predicted that could be automated and determine if there’s any way to get training data comprising of <input,output> pairs.
• Consider methods to augment existing data with a “pivot” column that can be used to join. For example, geo-location of an IP address could lead to joining with Census Data based on zip+4.
![Page 9: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/9.jpg)
Questions - Approaches
• Determine if your problem is one of prediction or one of grouping (clustering). The latter is more of a task that can lead to better understanding rather than solving a direct business problem.
![Page 10: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/10.jpg)
Questions - Approaches
• Determine if you are more interested in finding “interesting” relationships among data columns rather than knowing the columns. This is a task I’d call more of “discovery” than prediction but the idea is to determine one column as the output column in terms of the other columns as input.
• Doing this for all output columns can lead to “discovery” of those correlations that are the strongest (e.g., every time a customer buys beer at 5PM, he is likely to buy diapers). This is more of a fishing expedition, but can lead to unusual insights.
![Page 11: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/11.jpg)
Impetus Approach to Questioning Data
11 Copyright © Neustar Inc.
EXISTING DATA
PROPERTY
BUSINESS
STRATEGY
CUSTOMER
PROBLEM
STATEMENTS
ANALYSIS OF DATA
PROPERTY
DISCUSSION WITH
STAKEHOLDERS
ANALYSIS OF
PROBLEM
STATEMENT
DATA NEEDS
STATEMENT
REFINED
PROBLEM
STATEMENT
DATA ANALYTICS
PLAN
![Page 12: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/12.jpg)
Who knew there was religion in Analytics
12
» Statistical Analysis vs. Machine Learning» Stats people think “truth”» Machine Learning people think “near truth”
» Truth is easy to bound» Cost models make sense to org
» Near Truth is hard to explain and bound » It is where the real exploration happens» But – it can consume the Data Scientist
» Both can net real returns – and they need to co-exist
Copyright © Neustar Inc.
![Page 13: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/13.jpg)
13 Copyright © Neustar Inc.
![Page 14: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/14.jpg)
GOVERNANCE
14
» Don’t forget about Governance» Contracts» PII» Brand
» CPO & CISO are your friends - honestly» Protect your CUSTOMER DATA
» It will slow you down in the beginning» But you want your results to be reputable
» We need to get to a policy framework at some point that is automated
Copyright © Neustar Inc.
![Page 15: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/15.jpg)
About Impetus
» Accelerated consulting and services leader for Big Data; Headquartered in San Jose since 1996; 1400+; Presences in Silicon Valley, Atlanta, NYC; offices in India; Expertise through Architects
» Pioneers in distributed software engineering with vertical and functional expertise; Dedicated innovation labs; 200+ Big Data practitioners; 80+ dedicated to R&D
![Page 16: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/16.jpg)
Drill* Incoming Question
* Problem Landscape
* Underlying Constraints
* Specific Goals
Assess* Goal Driven Hypotheses
* Data Requirement
* Resource Requirements
* Analysis Plan
Target* Data Collection
* Quality Assessment
* Cross Validation
* Restructuring
Analyze* Test Previous Hypotheses
* Explore New Hypotheses
* Test
* Quantify Results
Recommend
* Summary of Results
* Key Novel Insights
* Impact Analysis
* Action Items
Data Science Approach
![Page 17: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/17.jpg)
» Recommender Systems
» Sentiment Analysis
» Topic Identification
» Predictive Analytics
» Data Stream Analytics
Data Science Focus Areas
Contact us at [email protected]
![Page 18: Asking the Right Questions of Your Data](https://reader034.fdocuments.us/reader034/viewer/2022051817/5487ba7db47959ce0c8b5564/html5/thumbnails/18.jpg)
Thank you
Questions?