Building Better Analytics Workflows (Strata-Hadoop World 2013)
-
Upload
wes-mckinney -
Category
Technology
-
view
67.685 -
download
2
description
Transcript of Building Better Analytics Workflows (Strata-Hadoop World 2013)
![Page 1: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/1.jpg)
Strata-Hadoop World 2013
Building better analytics workflows
![Page 2: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/2.jpg)
www.datapad.io
Wes McKinney
2
• Former quant @ AQR (a hedge fund)
• Creator of Pandas project for Python
• Author of Python for Data Analysis — O’Reilly
• Founder and CEO of DataPad
@wesmckinn
![Page 3: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/3.jpg)
www.datapad.io
• > 20k copies since Oct 2012• Bringing many new people
to Python and data analysis with code
3
![Page 4: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/4.jpg)
www.datapad.io
• Increasing data scale
• More and more data munging/integration
• Need for Statistics and Predictive Analytics
• Building complex data visualizations
• Inadequacy of Excel or other UI-driven data tools
4
Why so many learning to program?
![Page 5: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/5.jpg)
www.datapad.io5
Acquisition Preparation Visualization Analysis Sharing
The Analytics Workflow
![Page 7: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/7.jpg)
www.datapad.io7
What do we care about?
•Minimize time to answer
•Ask more questions
•Reduce friction between tools and processes
•Team productivity
![Page 9: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/9.jpg)
www.datapad.io9
What can go wrong?
•Inefficient workflows lead to lower quality analysis
•Results may not be actionable in a reasonable time-frame
![Page 11: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/11.jpg)
www.datapad.io11
Three type of problems
•Tooling
•Workflow management
•Collaboration
![Page 15: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/15.jpg)
www.datapad.io
For programmers, luckily it’s not 2005 anymore
•R: Hadley Wickham’s packages
•Python: pandas
•Hadoop: Pig
![Page 16: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/16.jpg)
www.datapad.io
Data preparation withvisual tools
•Google OpenRefine
•Google Fusion Tables
•Microsoft Excel
•Data Wrangler
![Page 17: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/17.jpg)
www.datapad.io
Some new startups building data preparation tools
![Page 18: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/18.jpg)
www.datapad.io
Business Intelligence:essential for doing business
![Page 19: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/19.jpg)
www.datapad.io
BI macro-trends
•Self Service BI
•Visual Discovery
•SQL on Hadoop
![Page 24: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/24.jpg)
www.datapad.io
Predictive analytics pitfalls
•Signal vs. Noise
• Identify the right patterns
•Uncertain ROI
![Page 25: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/25.jpg)
www.datapad.io
Some analytics workflow problems still need work
![Page 27: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/27.jpg)
www.datapad.io
Friction between tools:a typical scenario
•Excel and SQL for data wrangling
•Tableau for visualization
•SPSS/R for modeling
![Page 30: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/30.jpg)
www.datapad.io30
A
B
C D
E
F
Data workflows as dependency graphs?
![Page 31: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/31.jpg)
www.datapad.io31
Data workflows as dependency graphs?
CHRONOS
![Page 34: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/34.jpg)
www.datapad.io
Leveraging diverse skill sets
•Within teams, different competencies
•Work together on a data project - sharing code, data, tracking changes
![Page 37: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/37.jpg)
www.datapad.io
Make an impact
•Getting results into the hands of people who need it
•Getting models "into production"
![Page 42: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/42.jpg)
www.datapad.io
Accessible data science...with training wheels
![Page 44: Building Better Analytics Workflows (Strata-Hadoop World 2013)](https://reader036.fdocuments.us/reader036/viewer/2022062419/557ab0d2d8b42a9f2e8b508e/html5/thumbnails/44.jpg)
www.datapad.io
•http://datapad.io
•Founded in 2013, located in SF
• In private beta, join us!
•Hiring for engineering