Breaking Data Science Open
-
Upload
continuum-analytics -
Category
Data & Analytics
-
view
2.714 -
download
0
Transcript of Breaking Data Science Open
![Page 1: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/1.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 1
Breaking Data Science Open How Open Data Science is Eating the World
![Page 2: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/2.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 2
Michele Chambers @mcAnalytics • EVP Product & CMO Continuum Analytics • M.B.A Duke University, B.S. Computer
Engineering • Author
• Breaking Data Science Open: O’Reilly • Modern Analytics Methodologies: Driving Business
Value with Analytics Pearson FT Press • Advanced Analytics Methodologies: Driving
Business Value with Analytics Pearson FT Press • Big Data Big Analytics Wiley
About Us
![Page 3: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/3.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 3
About Us Christine Doig @ch_doig
• Sr. Data Scientist & Product Mgr. Continuum Analytics
• M.S. Polytechnic University of Catalonia in Industrial Engineering
• Open Source advocate and speaker • PyData, EuroPython, SciPy, PyCon,
• Author • Breaking Data Science Open: O’Reilly
5+ years in advanced analytics, operations research, machine learning in energy, manufacturing & banking
![Page 4: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/4.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 4
Business Intelligence & Predictive Analytics Using Data for Insight & Human-in-the-Loop actions
![Page 5: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/5.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 5
Cognitive Intelligence Using Data & Deep Learning to Make Recommendations
![Page 6: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/6.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 6
![Page 7: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/7.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 7
![Page 8: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/8.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 8
Open Data Science Connecting Data, Analytics & Computation
![Page 9: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/9.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
“ ”9
An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms.
Data Science is…
![Page 10: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/10.jpg)
10
an inclusive movement that makes open source tools of data science
— data, analytics, & computation — easily work together
as a connected ecosystem
Open Data Science is…
![Page 11: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/11.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Data Science is not just Machine Learning…
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
![Page 12: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/12.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Data Science is Interdisciplinary…
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Classification, deep learning, Regression, PCA
Hadoop, Spark Web crawling, scraping, 3rd party data & API providers, predictive services & APIs
GPUs, multi-cores Data warehouse, querying, reporting
![Page 13: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/13.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Open Source Communities Creates Powerful Technology for Data Science
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
![Page 14: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/14.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Python is the common language
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
![Page 15: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/15.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Python’s Not the Only One…
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
SQL
Machine Learning / Statistics
![Page 16: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/16.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
But it’s also a Great Glue Language
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
SQL
![Page 17: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/17.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Anaconda is the Open Data Science Platform Bringing Technology Together…
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
![Page 18: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/18.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 18
Empowering the Data Science Team
Data Scientist Biz Analyst Data Engineer Developer DevOps
Explore & Analyze
Collaborate & Publish
Deploy & Operate
![Page 19: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/19.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 19
Modern Data Science Teams use…
• Hadoop / Spark • Programming
Languages • Analytic Libraries • IDE • Notebooks • Visualization
• Spreadsheets • Visualization • Notebooks • Analytic
Development Environment
• Database / Data Warehouse
• ETL
• Programming Languages
• Analytic Libraries • IDE • Notebooks • Visualization
• Database / Data Warehouse
• Middleware • Programming
Languages
Data Scientist Biz Analyst Data Engineer Developer DevOps
RIGHT TECHNOLOGY FOR THE PROBLEM
![Page 20: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/20.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 20
Modern Data Science Teams Want…
DATA SCIENCE COLLABORATION
SELF-SERVICE DATA SCIENCE
DATA SCIENCE DEPLOYMENT
OPEN DATA SCIENCE
![Page 21: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/21.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 21
• Accelerate Time-to-Value
• Connect Data, Analytics & Compute
• Empower Data Science Teams
…is the leading Open Data Science platform powered by Python the fastest growing data science language
![Page 22: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/22.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 22
INNOVATE faster through managed agile experimentation
MOVE from analysis to deployment immediately
DELIVER powerful results backed by high performance open data science platform
LEVERAGE innovative open source analytics to extract value from data MAXIMIZE your computational power to easily analyze all data
CONNECT and integrate all your data sources for predictive models
ITERATE quickly to create powerful analysis and predictive models COLLABORATE and share with your data science team
PUBLISH interactive results to the business
ACCELERATE Time-to-Value
CONNECT Data, Analytics & Compute
EMPOWER Data Science Teams
![Page 23: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/23.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 23
Open Data Science Platform ACCELERATE. CONNECT. EMPOWER
![Page 24: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/24.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Gives Superpowers To People Who Change The World
![Page 25: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/25.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 25
Open Data Science Vibrant and Growing Community
Python Community
30M+ Packages in Anaconda
720+
R Community
16M+ Spark Python Usage
50%+
ANACONDA Downloads
3M+
![Page 26: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/26.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 26
Financial Services • Risk management, Quant modeling, Data exploration
and processing, algorithmic trading, compliance reporting
Government • Fraud detection, data crawling, web & cyber data
analytics, statistical modeling Healthcare & Life Sciences • Genomics data processing, cancer research, natural
language processing for health data science High Tech • Customer behavior, recommendations, ad bidding,
retargeting, social media analytics Retail & CPG • Engineering simulation, supply chain modeling,
scientific analysis Oil & Gas • Pipeline monitoring, noise logging, seismic data
processing, geophysics
…is Trusted by Industry Leaders
Anaconda
![Page 27: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/27.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
DEMOS
![Page 28: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/28.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 28
Anaconda Enterprise Notebooks A collaborative environment for Data Science teams
Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel
![Page 29: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/29.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Enterprise Notebooks A collaborative environment for data science teams
![Page 30: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/30.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 30
Search projects per tag and collaborators
Manage contributors
Manage collaborative projects
![Page 31: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/31.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 31
Organize notebooks, scripts and other files in projects
Manage teams’ collaborators
Save favorite projects
![Page 32: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/32.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 32
Data lineage
Interactive Visualizations
Advanced notebook extensions
Access to collaborative executable notebooks
![Page 33: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/33.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 33
• Publishing to Anaconda Repository integration • Revision control, commit and notebook diff comparison • Collaborative locking • Advanced interactive presentations editor
Use advanced notebook extensions for enhanced collaboration
![Page 34: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/34.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 34
Easily publish and share your results with Business Leaders and Analysts
![Page 35: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/35.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 35
Leverage revision control, commit and diff comparison in notebooks
Notebooks version tracking Notebooks changes diff comparison
Commit your work to be able to go back to, and compare changes with other revisions
![Page 36: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/36.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 36
Collaborate with notebooks locking features
![Page 37: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/37.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 37
Edit slides layout and content
Edit slides theme
Present your slides with embedded interactive visualizations
Transform notebook into an Interactive Presentation with an advanced editor
![Page 38: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/38.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel
![Page 39: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/39.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 39
Create browser-based Interactive Visualizations directly from your spreadsheet
Write your visualization directly into the formula
Access a powerful interactive toolbox
Enhance exploration with a customizable hover tool
![Page 40: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/40.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 40
Interactively explore your spreadsheet data with the cross filter app
Select variables to plot, and color, palette and size of the points
Immediately view your updates in the visualization
![Page 41: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/41.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 41
Access advanced Machine Learning models to cluster your data
Simple formulas for advanced modeling applications
Easily input variables into algorithms with interactive widgets
Access a wide range of modeling algorithms
![Page 42: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/42.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 42
Anaconda Enterprise Open Data Science Platform
DATA SCIENCE COLLABORATION
SELF-SERVICE DATA SCIENCE
DATA SCIENCE DEPLOYMENT
Empower the Data Science Team • Explore data interactively • Build, test, validate data science models with Python & R • Publish, share & reproduce data science results easily
Arm Citizen Data Scientists with Intelligent Apps • Empower your team with intelligent & interactive apps • Leverage data science from Microsoft Excel® • Create portable data transformations for reuse
Move Data Science into Production to Get Results • Go from ad hoc to production deployment easily • Launch & provision distributed environments • Boost performance by maximizing your computational power
![Page 43: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/43.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Open Data Science Starting the Journey to
![Page 44: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/44.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 44
Journey to Open Data Science
![Page 45: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/45.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 45
1. Reproducibility
2. Governance
3. Open source assurance
What are typical enterprise barriers to adopting Open Data Science?
45
![Page 46: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/46.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 46
Embrace Innovation Without Anarchy
From http://www.slideshare.net/RevolutionAnalytics/r-at-microsoft
Reproducibility
![Page 47: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/47.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 47
Embrace Innovation Without Anarchy
Controlled access to data science assets
Governance
![Page 48: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/48.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 48
Mitigate legal risk through selection of appropriate OSS license and vendor backed open source assurance
Embrace Innovation Without Risk Open Source Assurance
![Page 49: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/49.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 49
Next Steps
Download Anaconda
Download continuum.io/ downloads Documentation docs.continuum.io/
Check Out Anaconda Enterprise
Get Data Science Training
Migrate Your First Model to Python
Engage us for migrating SAS models to Python, to learn more contact [email protected]
Anaconda with scalable high performance, team collaboration & governance continuum.io/ anaconda-subscriptions/ anaconda-enterprise
Private corporate training and public online training formats available at continuum.io/training
![Page 50: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/50.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 50
Anaconda Subscriptions
![Page 51: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/51.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary 51
Thank You Michele Chambers Twitter: @mcAnalytics
Christine Doig Twitter: @ch_doig
Email: [email protected] Twitter: @ContinuumIO
![Page 52: Breaking Data Science Open](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a909760da3da068b6a4b/html5/thumbnails/52.jpg)
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Continuum Analytics We empower data science teams to make the world a better place
We Empower Data Science Teams to Make the World Better