Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken...

Post on 28-May-2020

19 views 1 download

Transcript of Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken...

AGILE DATA PROCESSING PIPELINESKen Collier, PhD Director, Agile Analytics @theagilist #thoughtworks

Conventional Architectures

Pull-based Batch Loads

Enterprise Data Models

Complex ETL Logic

Poorly Suited to

Non-Relational Data

Emergent design is difficult

DESIGN PRINCIPLES

Enable cheap/easy data ingestion

Enable inexpensive scaling

Enable emergent design

Enable easy recreation of information

Drive logic closer to the application

Enable near real time presentation

Support polyglot persistence

DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEYS DATA LINEAGE

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED

INFORMATION PUBLISHING TOPICAL QUEUES MDM CONCERNS DATA GOVERNANCE POST PROCESSING

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION POST PROCESSING

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Transformation Logic

Data Post Processing

Near Real Time Feed

Emergent Design &

Agile Delivery

Apache KafkaApache Storm

For questions or suggestions: !

Ken Collier kcollier@thoughtworks.com

Follow @theagilist @thoughtworks

THANK YOU