Hadoop meets Agile! - An Agile Big Data Model
-
Upload
uwe-printz -
Category
Data & Analytics
-
view
80 -
download
0
Transcript of Hadoop meets Agile! - An Agile Big Data Model
Da
ta S
ourc
esD
ata
Sys
tem
sA
pp
lica
tion
s
Traditional Sources
RDBMS OLTP OLAP …
Traditional Systems
RDBMS EDW MPP …
Business Intelligence
BusinessApplications
Custom Applications
Operation
Manage &
Monitor
Dev Tools
Build &
Test
New Sources
Logs Mails Sensor …SocialMedia
EnterpriseHadoop Plattform
#1 The Vision of the Big Data Lake
A Hadoop project feels just like yet another data warehouse project
-except the knowledge
#1 Vision & Reality
#1 Real world architecture - Insurance
Da
ta S
ourc
esD
ata
Sys
tem
sA
pp
lica
tion
s
Traditional Sources
RDBMS OLTP OLAP …
Traditional Systems
DWH
BusinessIntelligence
New Sources
Logs Sensor …SocialMedia
Enterprise Hadoop Plattform
SAS LASR Server
Apache Zeppelin
Batch Layer
Speed LayerData Ingestion
Data Processing
Data Storage
Data Storage Data Analysis
Visualization
Visualization
…
DataChannels
ms - s
min - h
#2 Lambda in Action - (e)Commerce
Data Ingestion
Data Processing
Raw Data
#2 Cassandra & Hadoop - AdServing
Data Processing
User Journey
Aggregated Data
Web Frontend
Aggregated Data< 120 days
Data Science
Batch Layer
Speed LayerData Ingestion
Stream Processing
ms - s
min - h
#3 Fraud detection - Financial services
DataImport
Data Preparation
Model Generation
Model Validation
Feature & Parameter Selection
Manual or automatic Iterations to tune
parameters
Use Model
Refresh Model from latest input data
Trade-offs for a Hadoop Platform
Cost Efficiency
Flexibility
Speed of Provisioning
Those companies will be successful that manage to build maximum flexibility and speed of provisioning into their platform without generating yet another silo, all while controlling the costs
Support for different speeds
A modern Hadoop platform needs to cope with different speed levels to enable different use cases.
Speed of data processing
Siz
e of
dat
a
Batch
Interactive
Streaming
Realtime
Batch Layer
Speed Layer
Data Ingestion
KB
TB
h ms
DataChannels
The Microservices of Hadoop
Data-centric, in Pipelines you have to think!
Producers Data Ingestion Data Storage & Analysis Visualization & Consumers
Batch Data
Streaming Data
MS SQL MySQL Oracle
JMS
Events
…
csv
Interactive Parallel Processing
HDFS (redundant, reliable storage)
SQL
Hive
YARN (Data Operating System)
In-Memory
Spark
Others
…
Search
Solr
Spark R
Ambari Views & Zeppelin (Visualization)
Hadoop Platform
MS SQL 2016 + R
Data Pipeline A
Data Pipeline C
Data Pipeline B
Core principles and values
• The core beliefs are the agile principles
• The foundation is a data-centric role model oriented on the domains of the Big Data Platform
• Independent project teams deliver data pipe lines - from the beginning to the end
• The project teams collaborate with specialized Big Data roles
• The data model is built on the principles of domain driven design (DDD)
• Data Governance is built on self-organization
Role model
Analytical Data
Operational Data Data Engineers
load and transform data
Answers to QuestionsData Analysts
process data
Data Scientists analyze and correlate data
Admins maintain, enhance, scale
“Hidden treasures“ Data Stewards are responsible for the data
quality in one domain
Big Data Platform
Raw Data
Data model
Project data
Project data
Data X
Domain A
Project data
Data Y
Domain B
Data Steward A
is responsible
Data Steward B
Project A
is responsible
uses
• The data model is based on the principles of Domain Driven Design (DDD)
• The data is divided into domains, the smallest domain is user data • User and project teams are directly responsible for their own data • Can use other existing data
• Data is bundled into comprehensive domains, e.g. • Business domains • National subsidiaries
• Domains can be hierarchical
• Responsibility for one domain is exactly at one data steward
• Always put meta data to the user data • If not possible otherwise, do it in an informal way, otherwise use an
automated tool
Don’t strive for an unified data model! • Redundancy will not be forced but accepted as a real-world necessity
User data
Collaboration model
Big Data Platform
Architecture Board
provide authoritative guidelines
Project A Project B Project X
use
consult
IT Operations
Data Stewards
Business Departments
Data Scientists
work with / are part of
…
Own projects
work with /
are part of
consult
are responsible
for data domains
consult
Own projects
are responsible
Project Teams
Role description: IT Operations
• Operates and monitors the Big Data Platform - based on an agreed-upon service level agreement (SLA)
• Keeps the platform up-to-date in short cycles
• Add additional components and technologies
• Scales the platform
• Have a DevOps mindset
Role description: Project Team
• A project team works on a data pipeline - from beginning to the end • Data pipelines can have different depth
• A project team is independent from other project teams • Project teams can collaborate
• A project team needs to have all roles to fulfill their project goal
• A project team has full responsibility for it’s own data
Project A
Data Scientist
Data Engineers
Data Analyst
Product Owner
Role description: Architecture Board
• Designs technological guidelines
• Consults on deviations from those guidelines
• Meets on a regular basis with full transparency
• Can consult project teams on their daily business
• Consists of architects, Data Stewards & Key members of the project teams
Role description: Data Steward
• Supervises the creation and usage of data and its quality • Steering person of a self-organized data
governance
• Is responsible for the user and meta data of (at least) one domain
• Operative role, works closely with all other roles
• Independent, self-organized team
• Are part of the architecture board
Role description: Data Scientist
• Independent team of data specialists
• Work as part of project teams but also have their own tasks, e.g. • Scientific assessment of data quality
• Generate project and product ideas
• Consult and work closely with data stewards and business departments
• Still unicorns on the job market
Get in contact
Twitter: @uweprintz [email protected]
Mail: [email protected]
Phone +49 176 1076531
XING: https://www.xing.com/profile/Uwe_Printz
Slide 1: https://unsplash.com/photos/7NtiJBowheE
Slide 2: Copyright by Uwe Printz
Slide 9: https://www.splitshire.com/little-dark-rider/
Slide 10: https://pixabay.com/de/kugelfisch-mexiko-handwerk-seziert-882440/
Slide 15: https://commons.wikimedia.org/wiki/File:Welcome_to_Fabulous_Las_Vegas_Sign.svg
Slide 19: http://unsplash.com/photos/7x4dOkulU9E/
Slide 22: https://unsplash.com/search/unicorn?photo=iWYrCr8eGwU
Slide 38: Copyright by Uwe Printz
All pictures CC0 or shot by the author