Do Agile Data in Just 5 Shocking Steps!

18
1 K I T C H E N DATA Do agile data in just 5 shocking steps! Copyright © 2015 by DataKitchen, Inc. All Rights Reserved. by Gil Benghiat [email protected] @benghiat @datakitchen_io Tuesday, May 19 CIC (Cambridge Innovation Center) 1 Broadway, Cambridge, MA

Transcript of Do Agile Data in Just 5 Shocking Steps!

1

K I T C H E N DATA

Do agile data in just 5 shocking steps!

Copyright © 2015 by DataKitchen, Inc. All Rights Reserved.

by Gil Benghiat

[email protected] @benghiat

@datakitchen_io

Tuesday, May 19 CIC (Cambridge Innovation Center)

1 Broadway, Cambridge, MA

Agenda

•Gil & DataKitchen

•A look at Agile through Data lenses

•How to do Agile Data

2

Gil Benghiat – decades working with data

• Network Management Data

• Database Management

• Clinical Trial Data

• Pharmaceutical Sales Data

• Data Liberation

• Data Preparation

[email protected]

@benghiat

6/2/2015 3

Solid Oak Consulting

4

Data Analysts And Their Teams Are Spending

60-80% Of Their Time On Data Preparation And Production

This creates an expectation gap

5

Analyze

Prepare Data

C

Analyze

Prepare Data

Business Customer Expectation

Analyst Reality

Communicate

The business does not think that Analysts are preparing data

Analysts don’t want to prepare data

6

DataKitchen is on a mission to integrate and organize data to make analysts super-powered.

• Offering

• Set-up service

• Software subscription

• UI to integrate data

• Benefits

• Data warehouse

• Eliminate drudgery of repeated integrations

agilemanifesto.org

6/2/2015 7

analytics

Switch the word

“software” to “analytics”

agilemanifesto.org

6/2/2015 8

and excel files

s/software/analytics/ The switch

works for the 12 principles too.

Iterate to improve the

analytics.

Iterate to improve the

process.

Agile methodologies contain a number of practices that can apply to data

Sprints

Stories

Prioritization

Daily Meeting

Defined roles

Retrospectives

Pair Programming

Burn down charts

etc. 9

The Data Analyst has the central role as the bridge between business and data

What do analysts and data scientists want?

Flexibility

&

Speed

6/2/2015 10

You need to be fast and

produce trustworthy

data

Some practices have been difficult to apply to data

Test Driven Development

Branching and merging

Refactoring

Small Releases

Frequent or Continuous Integration

Experimentation for learning

11

Do agile data in just 5 shocking steps!

12

❶ Add tests

Types 1. Error – stop the line

2. Warning – investigate later

3. Info – list of changes

Examples 1. Input file row count way below

a critical threshold

2. Input file row count a little below a threshold

3. These customers changed territories

6/2/2015 13

And keep adding them with each feature developed!

❷ Manage your transforms like code

Use a source code control system (like GIT) to enable:

• Branching

• Merging

• Diff

6/2/2015 14

❸ Provide a data environment for each branch

The underlying data is needed to develop and test the code/transformations

6/2/2015 15

❹ Support three types of workflows

Small Team

Promote directly to production

Feature Branch

Merge back to production branch

Data Governance

3rd party verification before production merge

6/2/2015 16

Review

Test

Approve

❺ Give you analysts and data scientists the ability to edit the DW safely

6/2/2015 17

Best-in-class companies take 12 days to integrate new data sources into their analytical systems; industry average companies take 60 days; and, laggards average 143 days

Source: Aberdeen Group: Data Management for BI: Fueling the

analytical engine with high-octane information

Figure out how to

do this in

minutes

18

K I T C H E N DATA

Do agile data in just 5 shocking steps!

Copyright © 2015 by DataKitchen, Inc. All Rights Reserved.

by Gil Benghiat

[email protected] @benghiat

@datakitchen_io

Tuesday, May 19 CIC (Cambridge Innovation Center)

1 Broadway, Cambridge, MA