SDS2018 CI CD Pipeline€¦ · Current Data Science Production Challenges •IT sent software to...
Transcript of SDS2018 CI CD Pipeline€¦ · Current Data Science Production Challenges •IT sent software to...
AnalyticOpsDevOps for Data ScienceRyan Krebs, Matt von Rohr
2
• Ransom Eli Olds• Ford Motor Company– „Moving Conveyor“
• Current Data Science Production Challenges
• Dockerized Model Management
The „Model“ Assembly Line
https://en.wikipedia.org/wiki/Ford_Model_T
3 © 2018 Teradata
Machine Learning Silos
Ops Finance
HR IT
In Operations we outsource ouranalytics with a 3rd party vendor.
IT has its ownAWS Kubernetesclusters running
DockerizedPython Models.
HR exportsfrom
QlikView, runsML models on laptops in R,
thendistributesresults in
Excel.
The Finance data scienceteam has a deep learning
model written in Scala using the On-Prem DEV
Hadoop Cluster.
4 © 2017 Teradata
Current Data Science Production Challenges
• IT sent software to re-implement and deploy
• Ad-hoc process
• Data Scientist sits in Analytic Silo
• Custom datasets
• Variety of modelling techniques and technologies
• Focus on trained model historical performance
Trained Models
Analytics
IT
• Business reviewsreports on models
• Multiple stakeholders and objectives
Performance Reports
Business
5 © 2018 Teradata
Inconsistent Data used in Analytics and Production?; Multiple reportingtools
Manual Model training; Custom DS reports; IT re-writes; Meetings toapprove
Opaque How were models trained? Approvals?Slow All of the above!
Consistency Unit testing to ensure correct data ingest; Templatedmodel reports
Automation CI model builds; Auto-generated reports; Trained modelsare production ready software; UI gathers approvals
Transparency VCS; Model metadata; UI surfaces metadataAgility All of the above!
7 © 2018 Teradata
AnalyticOps: SimplifiedDockerized Model Management
• Model Metadata• Scoring Services
• Champion/Challenger Automation• Business and Data Science Approvals• Auditability
8
Content Slide Keys to success – IN THEORY
DEV OPS ENGINEER
’’- Ron Bodkin 2016
DELIVERY EXCELLENCE
DATA SCIENTIST
SYSTEMS ARCHITECT
BUSINESS EXPERT
SOFTWARE ENGINEER+
The Approach The Team
Software Engineering
DataScience
Business
AnalyticsOps
10
Content Slide Hybrid Team: Unicorn vs Chimera
• Hard to find• Expensive• Hard to retain and inefficient
• Statistician + a little bit of a DE• Consultant + a little bit of a DS• BA + a little bit of a Developer• ETL Dev + a little bit of Statistics
11
AnalyticOps: Potential Components
Data scientist making models
The business using a trained modelValue
Exploration• Data Wrangling
• DS Lab
• Model scripting (untrained models)
• Testing, Training, Model Evaluation
• Version Control
• Dependency Management
Automation• Software unit tests
• Model Training
• Storage of trained models
• Model Evaluation
• Model Business Approval/Report Creation
• Comparison vs current Live model (Champion/Challenger)
Consumption• Real-time model scoring
engines
• Automatic deployment of trained model artefacts
• Dashboards and forecasts updated using new models
• Model performance monitoring
• Model output logging
Involving: Analysts, Data Scientists, Engineers, Dev Ops, Business Stakeholders
12
Case 1: Production in 3 months leads to considerable savingsAnalyticOps and Deep Learning to fight fraud at Danske Bank
Impact• Instant Adoption of several algorithms
to fight fraud attempts in real time: improvement of detection rate by 35%
• Fast delivery from design to productionin 12 weeks within an agile framework
© 2017 Teradata
Situation• All banks have an obligation to
protect their customers from fraudsters using advanced techniques to break systems
Problem• To revolutionize a major bank and
fight fraud within a bank’s strict regulated procedures and existing transactional data ecosystem
Solution• Integrated teams working
together towards production• Following the bank’s existing
standards, procedures & blueprint
13
Situation• New types of Machine Learning
proven to provide better outcomes that traditional approaches for insurance risk
Problem• An insurance company wanted to
build a real-time ML system able to respond to quote requests in real-time blending old and new ML techniques
Solution• Building an Analytics Ops layer
that supports multi-languages ML and is able to serve such models during a real-time process in less than a second
Impact
© 2017 Teradata
Case 2: Smart quoting for InsuranceA Machine Learning platform to real-time insurance quotes
CDL
CurrentDataStorage
Message Queue SystemBridge
Post-processingPre-processing
PersistencyLayer
Machine Learning Models in
Prod
Production Development
Catalog Model
Run Production Pipeline & update Models
Promote to Scoring (packaging)
Model (Dev, Test, Validate)
Wrangle
Promote model to production
• Feature enrich• Scoring• Logging
Scoring Engine
Real Time/Batch Data