Architecture of Big Data Solutions
-
Upload
guido-schmutz -
Category
Data & Analytics
-
view
176 -
download
0
Transcript of Architecture of Big Data Solutions
![Page 1: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/1.jpg)
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architecture of Big Data SolutionsGuido SchmutzFrankfurt, 13.12.2017
@gschmutz guidoschmutz.wordpress.com
![Page 2: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/2.jpg)
Guido Schmutz
Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataHead of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 30 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz
Architektur of Big Data Solutions
![Page 3: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/3.jpg)
Agenda
1. Introduction2. Big Data & Fast Data Reference Architectures3. Continuous Streaming Data Ingestion4. Big Data & Cloud5. Microservices Architecture6. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
![Page 4: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/4.jpg)
Introduction
Architektur of Big Data Solutions
![Page 5: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/5.jpg)
Big Data Definition (4 Vs)
+Timetoaction?– BigData+Real-Time=StreamProcessing
CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination
Architektur of Big Data Solutions
![Page 6: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/6.jpg)
Architektur von Big Data Lösungen
Enterprise Data Warehouse
ETL / Stored Procedures
Data Marts / AggregationsLocation
Social
Clickstream
Segmentation & ChurnAnalysis
BI Tools
Marketing Offers
Billing &Ordering
CRM / Profile
MarketingCampaigns
Architektur of Big Data Solutions
![Page 7: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/7.jpg)
Traditional Flow Diagram - Challenges
Enterprise Data Warehouse
ETL / Stored Procedures
Data Marts / AggregationsLocation
Social
Clickstream
Segmentation & ChurnAnalysis
BI Tools
Marketing Offers
Billing &Ordering
CRM / Profile
MarketingCampaigns
Limited Processing
Power
Does not model easily to traditional
database schema
Limited Processing
Power
Storage Scaling
very expensive
Based on sample /
limited data
Loss in Fidelity
Other / New Data Sources
High Voume
and Velocity
Architektur of Big Data Solutions
![Page 8: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/8.jpg)
Big Data to the rescue? Why is a structuring / architecture important?
Architektur of Big Data Solutions
![Page 9: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/9.jpg)
Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the right/matching technologies
Architektur of Big Data Solutions
![Page 10: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/10.jpg)
Big Data & Fast Data Reference Architectures
Architektur of Big Data Solutions
![Page 11: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/11.jpg)
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 12: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/12.jpg)
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture - Hadoop
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 13: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/13.jpg)
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture - Spark
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 14: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/14.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
![Page 15: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/15.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Architektur of Big Data Solutions
![Page 16: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/16.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
highlatency
![Page 17: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/17.jpg)
“Data at Rest” vs. “Data in Motion”
Architektur of Big Data Solutions
Data at Rest Data in Motion
![Page 18: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/18.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
Streaming Analytics Architecture
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
![Page 19: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/19.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Streaming Analytics Architecture – Open Source
![Page 20: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/20.jpg)
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
Streaming Analytics Architecture
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
lowlatencywithoutkeepingrawdata/events
![Page 21: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/21.jpg)
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
Keep raw event data
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
Search
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event HubEvent
HubEvent Hub
File Import / SQL Import
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
![Page 22: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/22.jpg)
“Lambda Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Event HubEvent
HubEvent Hub
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 23: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/23.jpg)
“Kappa Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 24: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/24.jpg)
Hadoop ClusterdHadoop ClusterBig Data Cluster
“Unified Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 25: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/25.jpg)
Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
![Page 26: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/26.jpg)
Hadoop ClusterdHadoop ClusterBig Data Cluster
Continuous Data Ingestion
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
![Page 27: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/27.jpg)
Continuous Streaming Data Ingestion
DBSourceBigDataLog
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ect
CDC
DBSource
Log CDC
Native
IoT Sensor
IoT Sensor
31
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST31FileSourceLog
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
![Page 28: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/28.jpg)
Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
SQL Polling
Change Data Capture (CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
![Page 29: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/29.jpg)
Continuous Streaming Data Ingestion
DBSourceBigDataLog
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ect
CDC
DBSource
Log CDC
Native
IoT Sensor
33
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST33FileSourceLog
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
![Page 30: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/30.jpg)
Big Data & Cloud
Architektur of Big Data Solutions
![Page 31: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/31.jpg)
Data Locality vs. Compute/Storage Separation
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute and storage – the fundamental difference• store data in Object
Storage instead of DFS
• bring up Compute nodes only for data processing
• multiple workloads on separate clusters can access same data
Architektur of Big Data Solutions
![Page 32: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/32.jpg)
A new way to Manage Big Data
Big Data Traditional Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big DataA New Approach
Containers and VMs
Compute and storage separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
Architektur of Big Data Solutions
![Page 33: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/33.jpg)
Hadoop ClusterdHadoop ClusterBig Data Cluster
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Big Data & Cloud - Amazon WebServices (AWS)
![Page 34: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/34.jpg)
Microservices Architecture
Architektur of Big Data Solutions
![Page 35: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/35.jpg)
Hadoop ClusterdHadoop ClusterBig Data Cluster
Asynchronous Microservice Architecture
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Microservice Cluster
Microservice State
{}
API
Stream Analytics Cluster
StreamProcessor
State
{}
API
EventStream
EventStream
Service
Architektur of Big Data Solutions
![Page 36: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/36.jpg)
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
![Page 37: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/37.jpg)
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
![Page 38: Architecture of Big Data Solutions](https://reader034.fdocuments.us/reader034/viewer/2022051504/5a64d3bd7f8b9adf788b4a81/html5/thumbnails/38.jpg)
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions