Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... ·...
Transcript of Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... ·...
![Page 1: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/1.jpg)
–
The Liip Data Science StackInsights from building and maintaining it
Zürich, 08.05.2018
![Page 2: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/2.jpg)
2
About me - Quick facts
Dr. Thomas Ebermann
- Diploma in Computer Science at the Univ. of
Mannheim & Waterloo.
- PhD in Computational Social Science predicting
information flow in Twitter.
- Working for Liip as Data Scientist since 2016.
- Love Ruby and Python.
![Page 3: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/3.jpg)
Purpose over profitsTrust over controlPractice over theoryRisk over safety Flexibility over strengthOpen over closed Compasses over maps
LIIP PRINCIPLES
3
![Page 4: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/4.jpg)
4
The Data Science Stack
![Page 5: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/5.jpg)
History
5
– All github stars– All my bookmarks mobile and mac– Email / newsletters– Internal company slack
– Collect all the data science tools that, I use on a regular basis, have emerged on my horizon.
– Finally sort the mess in my head.
![Page 6: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/6.jpg)
The Stack Idea
We use stacks in web dev in various areas, where we describe systems that build on top of each other and work well together:
LAMP Stack (Linux, Apache, Mysql, Php)
Why not have a Data science stack of tools that work well together?
Instead pointing to only one tool lets point to whole families of similar tools.
6
![Page 7: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/7.jpg)
The Data Science Stack
7
Where does the data come from?
Data Sources
How can we analyse it?
Analysis
Are there solutions that can do all in one?
Business Intelligence
How can we clean and transform it?
Data Processing
How can we efficiently store/retrieve/search it?
Database
How can we visualise it?
Visualisation
![Page 8: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/8.jpg)
But wait what about the Gartner reports?
- Very high level
- Only big players
- Very few open source solutions
- No small tools
- Have to sell your soul to get into these magic
quadrants
8
![Page 9: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/9.jpg)
2017 Version
9
![Page 10: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/10.jpg)
The 2017 PDF Poster
- 250 Tools in one poster- Provide orientation like a map- Discover your white spots on the map- Over 30’000 visitors- Over 4’300 downloads worldwide- Over 300 mail signups to be notified for
Version 2
Quite a success but it was out of date the day we created it!
10
![Page 11: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/11.jpg)
Insights
11
![Page 12: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/12.jpg)
Insights Data Sources
- Scrapers (7): Lots of tools and variety, very open source friendly (PhantomJS+Capybara)
- Website Analytics (37): There are surprisingly a lot more tools out there than Google analytics. (Google Analytics)
- Tag Management (6): A lot of competition has emerged since google tag manager (Google Tag Manager)
- Heatmaps (5): Controversial but insightful (Hotjar)
- Mobile Analytics (18): A lot of specialized tools (Google Analytics)
- Social Media (12) : Due to exclusive contracts and harmonization/acquisition there are only a few big cross-platform data providers out there (Brandwatch)
- IoT (8): Marginal role for us now as a data source right now (Ubidot)
12
![Page 13: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/13.jpg)
Insights Data Processing
- ETL (10): Tools for very big scale or Datalakes (TalenD)
- Data Cleaning (3): User friendly tools exist that target not only the data
scientist (Trifacta)
- Alerting & Logging (7): Excellent open source production ready solutions
change the way logs are consumed these days (Graylog)
- Message Queues (20): PubSub (Kafka), Real Time processing on the fly is
the new paradigm (Flink), Apache Foundation very active here
13
![Page 14: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/14.jpg)
Insights Databases
- Databases (43): There is much more than MYSQL vs NoSQL. Graph
databases (Neo4J), time series databases (TimeScaleDB), Key-Value (Redis),
Column-Oriented (Vertica, VoltDB, Exasol)
- Search (20): A lot of good alternatives to Solr exist nowadays (Elastic) and
SaaS is very popular (Algolia)
- Hadoop Ecosystem (13): The whole Zoo of Tools is maturely integrated yet
remains complex (Spark)
14
![Page 15: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/15.jpg)
Insights Analysis
- Deep Learning (21): Huge momentum lots of different frameworks and applications
are popping up (Tensorflow/Keras)
- Statistical software packages (11): The old monoliths are slowly being surpassed by
open source solutions (R, Rapidminer, Orange)
- General ML libraries (24): A myriad of choices for every programming language yet
python remains subjectively the most active one (scikit-learn)
- Computer Vision (9): All big 5 offer Saas solutions, but open source is strong (openCV)
- NLP/Speech recognition (23): Same here (Wit.ai)
- Assistants/Chatbots (15): A lot of promising solutions and frameworks quickly
emerged (Chatfuel)15
![Page 16: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/16.jpg)
Insights Visualization
- General Visualisation (32): Huge number of tools,, stable candidates for
python (seaborn), R (shiny)
- JS visualisation (28): JS libs are popping up every week (D3) :)
- Dashboards (17): Line between BI and dashboards is blurring, not too
many open source solutions available (Plotly)
16
![Page 17: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/17.jpg)
Business Intelligence
- Business Intelligence (46): I thought I knew a couple of alternatives, but
the options are vast and highly competitive. Most solutions are commercial
but good open source solutions are available (Kibana, Tableau). Ask
Gartner :)
- BI on Hadoop(5): Hard to see where the solutions begin and the
architecture ends (Datameer)
- Data Science Platforms (23): The new BI. Combination between the
freedom of Ipython notebooks and solid infrastructure (Datarobot).
Automated ML.
17
![Page 18: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/18.jpg)
2018 Version
18
![Page 19: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/19.jpg)
From PDF to Website
19
http://datasciencestack.liip.ch
![Page 20: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/20.jpg)
Features I
- You can add tools too!- Search
20
![Page 21: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/21.jpg)
Features II
- Internal Liip technology db
integration (Zebra)
- Quarterly Mailing List (keep
busy deciders up to date)
- JSON Export
21
![Page 22: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/22.jpg)
Insights 2018
22
![Page 23: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/23.jpg)
Outlook
23
![Page 24: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/24.jpg)
Whats next?
Assessment of Tools
- Adopt: We feel strongly that the industry should
be adopting these items. We use them when
appropriate on our projects.
- Trial: Worth pursuing. It is important to
understand how to build up this capability.
Enterprises should try this technology on a
project that can handle the risk.
- Asses: Worth exploring with the goal of
understanding how it will affect your enterprise.
- Hold: Proceed with caution.
24
![Page 25: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/25.jpg)
Solid rucksack
Data Sources: Google Analytics
Processing: Trifacta
Analysis: Scikit-Learn
Visualization: Highcharts(JS), Shiny(R), Seaborn(python)
Business Intelligence: KNIME
25
![Page 26: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/26.jpg)
Trendy rucksack
Sources: Chartbeat or Snowplow
Processing: Fluentd
Analysis: Keras
Visualization: Plotly
Business Intelligence: Data Robot or Dataiku
26
![Page 27: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/27.jpg)
27
152 employees5 locations1 vision
Tuesday 10:21
St. GallenZürich
Bern
Fribourg
Lausanne
![Page 28: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/28.jpg)
Data Services @ Liip
28
Virtual Assistants
• Chatbots and Assistants
Data Solutions
• Recommender Systems
• Computer Vision
• Speech Recognition
• Integrated ML Models
• Whole Web-apps / apps
Data Science / Consulting
• From Data to Insights
• Data Analysis
• Network Analysis (SNA)
• Social Graph
• Time Series
• Machine Learning
Data Visualization
• Data Visualization
• Geo Visualization
• Data-Modeling
• Real Time
DashboardingBig Data
• Storage (Hadoop)
• And Analysis (Spark)
• Data Streams (Kafka)
Open Data
• Infrastructure (CKAN)
• Linked Data
Data-Driven User Experiences
• Data Interfaces
• Conversational Design
Mobile AI
• CoreML
![Page 29: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/29.jpg)
Thank you!Excited to hear your questions.
Dr. Thomas Ebermann
Data Scientist
![Page 30: Stack The Liip Data Science - Netlivemypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC... · 2018-05-09 · The Liip Data Science Stack Insights from building and maintaining](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd68a699ab5f0b09512959/html5/thumbnails/30.jpg)
30
Scrapers (7) Website Analytics(37) Social Media (12)
Tag Management (6) Mobile Analytics (18) Heatmaps (5) IoT (8)
Insights Data Sources