Post on 19-Aug-2015
The Process1. Productize
- Compelling data products- Innovation pipeline
2. Ruggedize- Toolchain: Rstudio, Devtools, Github, Travis CI, Docker- Strong testing- Production-ready Architecture
3. Assimilate- Command line tools- Make it into HTTP APIs- Make it into Docker containers
Step 1: ProductizeInternal Products:
- Ad-hoc Analyses - Internal Dashboards- Automated reports- Rapid Prototyping
External Products:- End-user data products- Backend services
Step 2: Ruggedize
1. Create reproducible architecture2. Set up strong testing & CI 3. Separate Production and Dev 4. Set up monitoring & reporting
Case Study: HB Architecture
- Rstudio - Containerized Architecture- Continuous Integration- Multiple Environments- Notifications/Monitoring
Data Architecture
elasticsearch:
image: elasticsearch
shiny-server:
image: shiny
ports:
- "443:443"
links:
- elasticsearch
etl:
image:etl
volumes:
- .:/data
etl-data:
image: etl-dataETL
Shiny Server Elastic
ETL Data
SQL S3
Web
rAPI
SQL
Shiny Server
Elastic
ETL data
ETL
rAPI
Docker Compose Containers
+ =
Rstudio Server
Environments
ETL
Shiny Server Elastic
data volume
SQL S3
www.dataproduct.com
internal-dashboards.com
ETL
Shiny Server Elastic
data volume
SQL S3
staging-www.dataproduct.com
staging-internal-dashboards.com
Production Staging
Continuous Integration
Github Travis CI
commit
latest-stable tag
Production
pull latest-stable
Staging
pull latest-stableSuccess!
Docker Registry/Rolling Back
Docker Registry
ETL data volume
Changes Deployed to Prod
Save Versioned Image
Danger! Need to Rollback!
ETL data volume
Load Older Image
Docker Registry
Assimilate (contd)- HTTP APIs
- OpenCPU, rapier- Docker containers
- Rocker- Command line tools
- Rscript, littler, docopt