Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by...
Transcript of Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by...
![Page 1: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/1.jpg)
DataScienceNotebookGuidelines
ODPi BI&DataScienceSIG:CupidChan
MoonSooLeeFrankMcQuillan
![Page 2: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/2.jpg)
• BridgingthegapsothatBItoolscansitharmoniouslyontopofbothHadoopandRDBMS,whileprovidingthesame,orevenmore,businessinsighttotheBIuserswhohavealsoHadoopinthebackend.
• ProvideanobjectiveguidelineforevaluatingtheeffectivenessofaBIsolution,and/orotherrelatedmiddlewaretechnologies
BI&DataScienceSpecialInterestGroup(SIG)
![Page 3: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/3.jpg)
Targetuserpersona
• Jupyter:Datascienceuserwithprogrammingexperienceinoneofthesupportedkernels
• Zeppelin:Dataengineer,datascientistandbusinessusersinthesamedataprocessingpipelineneedtocollaborate
![Page 4: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/4.jpg)
Installation
• Jupyter:EasyinstallationwithAnacondaorpip.Standalone,orHadoopandSpark(viaYARN)clusterssupported.
• Zeppelin:Downloadbinarypackageandstartdaemonscript.IncludedinHDP.
![Page 5: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/5.jpg)
Configuration
• Jupyter:Editconfig filesorusecommandlinetoolfornotebooksettings.Communitymaintainedlanguagekernelshavevariousconfigurationworkflows.
• Zeppelin:Editconfig files.InterpreterscanbeconfiguredthroughGUI.
![Page 6: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/6.jpg)
UserInterface
• Jupyter:Functionalnotebookuserinterfacethatcanbeusedtocreatereadableanalysescombiningcode,images,comments,formulaeandplots.
• Zeppelin:Notebookinterfacethatusercandocument,runcodes,visualizeoutputswithflexiblelayoutandmultiplelookandfeel.
![Page 7: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/7.jpg)
Supportedlanguages
• Jupyter:Python,R,Juliaanddozensofcommunitymaintainedkernels
• Zeppelin:VariouslanguagesupportsareincludedinthebinarypackagewhichSpark,Python,JDBCandetc.3rdpartyinterpretersareavailablethroughonlineregistry
![Page 8: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/8.jpg)
Multi-usersupport
• Jupyter:NativeJupyter doesnotsupportmulti-user.HoweverJupyterHub canbeusedtoservenotebookstousersworkinginseparatesessions.
• Zeppelin:Multipleuserscancollaborateinreal-timeonanotebook.Multipleuserscanworkwithmultiplelanguagesinthesamenotebook.
![Page 9: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/9.jpg)
Supportandcommunity
• Jupyter:Matureprojectwithactivecommunityandgoodsupport.Jupyter projectbornin2014buthasrootsgoingbackto2001.
• Zeppelin:ApacheZeppelinisoneofthemostactiveprojectinApacheSoftwareFoundation.Projectbornin2013andbecametoplevelprojectofASFin2015.
![Page 10: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/10.jpg)
Architecture
• Jupyter:Thenotebookserversendscodetolanguagekernels,rendersinabrowser,andstorescode/output/MarkdowninJSONfiles.
• Zeppelin:Zeppelinserverdaemonmanagesmultipleinterpreters(backendintegrations).Webapplicationcommunicatestoserverusingwebsocketforreal-timecommunication.
![Page 11: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/11.jpg)
Bigdataecosystem
• Jupyter:Canbeconnectedtoavarietyofbigdataexecutionenginesandframeworks:Spark,massivelyparallelprocessing(MPP)databases,Hadoop,etc.
• Zeppelin:TightlyintegratedwithApacheSparkandotherbigdataprocessingengines.
![Page 12: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/12.jpg)
Security
• Jupyter:Codeexecutedinthenotebookistrusted,likeanyotherPythonprogram.Token-basedauthenticationonbydefault.Rootusedisabledbydefault/
• Zeppelin:Userauthentication(LDAP,ADintegration)NotebookACL.InterpreterACL.SSLconnection.
![Page 13: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/13.jpg)
Datasciencereadiness
• Jupyter:Widelyusedbydatascientistsforavarietyoftasksincludingquickexploration,documentationoffindings,reproducibility,teaching,andpresentations
• Zeppelin:Datascientistscancollaborateeachother.Alsobusinessuserscanloginandcollaboratewithdatascientistsdirectlyonnotebooks.
![Page 14: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/14.jpg)
JupyterFrankMcQuillan
![Page 15: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/15.jpg)
Agenda
• WhatisaJupyter notebook?• Lightningtutorial- myfirstJupyter notebook• Datascienceexamples
– Python– SQL
• Keystrengthsandpotentialareasofimprovement
![Page 16: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/16.jpg)
WhatisaJupyter Notebook?
• Tellastorywithyourdata• Programinawebbrowser• “Multimodal”• Favoritetoolofdatascientistsandresearchers
![Page 17: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/17.jpg)
SupportandCommunity
• 2001- IPythonnotebookproject(FernandoPerez)• 2014- Jupyternotebooklaunched• Opensource(modifiedBSDlicense)• Steeringcouncilof~15membersfromacademiaandcommercialcompanies
• Matureproductwithactivecommunityhttps://stackoverflow.com/search?q=jupyter returns~10,500results
![Page 18: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/18.jpg)
Architecture
● IPython● IRkernel● IJulia● Dozensofcommunity
maintainedkernelshttps://github.com/jupyter/jupyter/wiki/Jupyter-kernels
![Page 19: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/19.jpg)
Demo
![Page 20: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/20.jpg)
Summary
• Keystrengths– Datasciencefriendly–Matureproject–Widelyused– IntuitiveUI– Nicepresentationofcode,images,comments,formulae
– Lotsofavailablekernels
• Somepotentialimprovements–Multi-usersupport– Celldraganddrop– Hidingcode/output– IDEtypeoperationslikesyntaxchecking,versioncontrol,runningcodeonelineatatime
![Page 21: Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by data scientists for a variety of tasks including quick exploration, documentation](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec5da53148dbc039436da82/html5/thumbnails/21.jpg)
ZeppelinMoonSooLee
Slide & demo notebook - https://s.apache.org/ZPLN