Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
-
Upload
timothy-spann -
Category
Technology
-
view
248 -
download
2
Transcript of Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
![Page 1: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/1.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Enterprise Data Science at Scale: Introducing Data Science Experience (DSX)
Future of Data – Princeton Meetup14-November-2017
![Page 2: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/2.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenter
Tim Spann
![Page 3: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/3.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
à #1 Pure Open Source Hadoop Distribution
à 1000+ customers and 2100+ ecosystem partners
à Employs the original architects, developers and operators of Hadoop from Yahoo!
à Best-in-class 24x7 customer support
à Leading professional services and training
à #1 Data Science Platform (Source: Gartner)
à OpenPOWER performance leadership
à Flexible, software defined storage
à #1 SQL Engine for complex, analytical workloads
à Leader in On-premise and Hybrid Cloud solutions
+
IBM + Hortonworks = Unlocking Actionable Insights
![Page 4: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/4.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science In Action
Data ScientistsResponsible for “The Math”
Data EngineersResponsible for “The Data”
Business AnalystResponsible for “The Business”
The Team The Process
Corporate ITResponsible for “Technology”
![Page 5: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/5.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Challenges
Data Scientists“I like my own tools”“How can I productionize my model”
Data Engineers“I need a central place for data”“How can I efficiently transform data”
Business Analyst”I need to visualize the shape of data”“How can we fail fast and prototype quickly”
The Team The Process Productionizing with data
So many tools & limited compute resources
Data Discovery
Model detioriation & data evolution
Corporate IT“How do I govern and secure this?”“I can’t support all of these tools”
![Page 6: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/6.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
The IBM + HWK Data Science Experience
Data ScientistsTools: R Studio, Juypter, Zeppelin, H20, etcModel management
Data EngineersPlace all data assets in one placeProductionize models with REST endpoints
Business AnalystRich data visualizationCommunity and collaboration of knowledge
The Team The Process
Corporate ITRun secure & governed data scienceOne experience to support many tools
Collaboration
Community
![Page 7: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/7.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Solution
Community Open Source Scale & Enterprise Security
• Find tutorials and datasets• Connect with Data Scientists• Ask questions• Read articles and papers• Fork and share projects
• Code in Scala/Python/R/SQL• Zeppelin & Jupyter Notebooks• RStudio IDE and Shiny• Apache Spark• Your favorite libraries
• Data Science at Scale• Run Spark Jobs on HDP Cluster• Secure Hadoop Support• Ranger Atlas Support for Data• Support for ABAC
Model Management
• Data Shaping Pipeline UI• Auto-data preparation & modeling• Advanced Visualizations• Model management & deployment• Documented Model APIs
Data Science Experience
![Page 8: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/8.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 9: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/9.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Use Case
à All industries are effected by churn.à Being able to predict churn helps
companies take action and keep customers longer.
à The more historical data, the better the model
à Data collected and labeled over time based on churn.
à Using a Random Forest we will predict future churners.
Customer Churn Architecture
![Page 10: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/10.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo ScenarioAssessing Customer Churn Probability in Real Time
• Stored long term data on customer churn behavior
• New real time data coming in
• Predict a customers churn probability before they churn
• Alert the proper departments | manager
• Business monitors customer retention outlook & performance
![Page 11: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017](https://reader031.fdocuments.us/reader031/viewer/2022030317/5a65429a7f8b9ace0b8b48a7/html5/thumbnails/11.jpg)
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo ScenarioProblems Solved
• Data Scientist collaborate, learn new tools & frameworks
• Choice of tools, notebooks and languages
• Run favorite notebook on all data in the HDP Cluster
• Deploy the model to production
• Leverage the production model to deliver insights to business
• Monitor models and retrain models as new data comes in