Do Agile Data in Just 5 Shocking Steps!
-
Upload
datakitchen -
Category
Data & Analytics
-
view
247 -
download
0
Transcript of Do Agile Data in Just 5 Shocking Steps!
1
K I T C H E N DATA
Do agile data in just 5 shocking steps!
Copyright © 2015 by DataKitchen, Inc. All Rights Reserved.
by Gil Benghiat
[email protected] @benghiat
@datakitchen_io
Tuesday, May 19 CIC (Cambridge Innovation Center)
1 Broadway, Cambridge, MA
Gil Benghiat – decades working with data
• Network Management Data
• Database Management
• Clinical Trial Data
• Pharmaceutical Sales Data
• Data Liberation
• Data Preparation
@benghiat
6/2/2015 3
Solid Oak Consulting
4
Data Analysts And Their Teams Are Spending
60-80% Of Their Time On Data Preparation And Production
This creates an expectation gap
5
Analyze
Prepare Data
C
Analyze
Prepare Data
Business Customer Expectation
Analyst Reality
Communicate
The business does not think that Analysts are preparing data
Analysts don’t want to prepare data
6
DataKitchen is on a mission to integrate and organize data to make analysts super-powered.
• Offering
• Set-up service
• Software subscription
• UI to integrate data
• Benefits
• Data warehouse
• Eliminate drudgery of repeated integrations
agilemanifesto.org
6/2/2015 8
and excel files
s/software/analytics/ The switch
works for the 12 principles too.
Iterate to improve the
analytics.
Iterate to improve the
process.
Agile methodologies contain a number of practices that can apply to data
Sprints
Stories
Prioritization
Daily Meeting
Defined roles
Retrospectives
Pair Programming
Burn down charts
etc. 9
The Data Analyst has the central role as the bridge between business and data
What do analysts and data scientists want?
Flexibility
&
Speed
6/2/2015 10
You need to be fast and
produce trustworthy
data
Some practices have been difficult to apply to data
Test Driven Development
Branching and merging
Refactoring
Small Releases
Frequent or Continuous Integration
Experimentation for learning
11
❶ Add tests
Types 1. Error – stop the line
2. Warning – investigate later
3. Info – list of changes
Examples 1. Input file row count way below
a critical threshold
2. Input file row count a little below a threshold
3. These customers changed territories
6/2/2015 13
And keep adding them with each feature developed!
❷ Manage your transforms like code
Use a source code control system (like GIT) to enable:
• Branching
• Merging
• Diff
6/2/2015 14
❸ Provide a data environment for each branch
The underlying data is needed to develop and test the code/transformations
6/2/2015 15
❹ Support three types of workflows
Small Team
Promote directly to production
Feature Branch
Merge back to production branch
Data Governance
3rd party verification before production merge
6/2/2015 16
Review
Test
Approve
❺ Give you analysts and data scientists the ability to edit the DW safely
6/2/2015 17
Best-in-class companies take 12 days to integrate new data sources into their analytical systems; industry average companies take 60 days; and, laggards average 143 days
Source: Aberdeen Group: Data Management for BI: Fueling the
analytical engine with high-octane information
Figure out how to
do this in
minutes
18
K I T C H E N DATA
Do agile data in just 5 shocking steps!
Copyright © 2015 by DataKitchen, Inc. All Rights Reserved.
by Gil Benghiat
[email protected] @benghiat
@datakitchen_io
Tuesday, May 19 CIC (Cambridge Innovation Center)
1 Broadway, Cambridge, MA