Big Data made easy in the era of the Cloud - Demi Ben-Ari
-
Upload
demi-ben-ari -
Category
Software
-
view
29 -
download
1
Transcript of Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of CloudDemi Ben-Ari - VP R&D @ Panorays
About Me
Demi Ben-Ari, Co-Founder & VP R&D @ Panorays● Google Developer Expert● Co-Founder of Communities:
○ “Big Things” - Big Data, Data Science, DevOps○ Google Developer Group Cloud○ Ofek Alumni Association
In the Past:● Sr. Data Engineer - Windward● Team Leader & Sr. Java Software Engineer,
Missile defence and Alert System - “Ofek” – IAF
AutomatetheSecurityManagementofThirdParties
Capturethe Hacker’sView
GetRealtime Ratings
Complywith Regulations
Say “Distributed”, Say “Big Data”,Say….
What is Big Data (IMHO)? And What to Monitor?
● Systems involving the “3 Vs”:What are the right questions we want to ask?○ Volume - How much?○ Velocity - How fast?○ Variety - What kind? (Difference)
What had happened in the last years?
● Storage got cheaper● The capacity of Data grew exponentially● Cloud service providers grew rapidly● Connectivity got much easier● Cloud made “by demand” computation possible● “Compute” started moving to the “Data” and not the other way.
Situations & Problems
https://imgflip.com/i/1ap5krhttp://kingofwallpapers.com/otter/otter-004.jpg
MongoDB + Spark
Worker 1Worker 2
….….
……
Worker N
Spark Cluster
Master
WriteRead
MasterSharded MongoDB
Replica Set
Cassandra + Spark
Worker 1Worker 2
….….
……
Worker N
Cassandra Cluster
Spark Cluster
WriteRead
Cassandra + Serving
Cassandra Cluster
WriteRead
UI ClientUI Client
UI ClientUI Client
Web ServiceWeb
ServiceWeb ServiceWeb
Service
Distributed Microservices Architecture
Service A
Queue
DB
Service B
DBCache
Cache DBService C
Web Server
DB
Analytics Cluster
Master
Slave Slave Slave
Monitoring System???
Did someone say Containers?
Docker Environments
● Docker?
● Orchestration?
VS
● Wait, What about local mode? ○ Minikube vs Docker Engine
Problems
● Multiple physical servers● Multiple logical services● Want Scaling => More Servers
Data flow and Environment(Use Case)
Structure of the Data
● Maritime Analytics Platform● Geo Locations + Metadata ● Arriving over time● Different types of messages being reported by satellites ● Encoded (For compression reasons)● Might arrive later than actually transmitted
Data Flow Diagram
External Data
Source
Analytics Layers
Data Pipeline
Parsed Raw
Entity Resolution Process
Building insightson top of the entities
Data Output Layer
Anomaly Detection
Trends
UI for End Users
Environment Description
Cluster
Dev Testing Live Staging ProductionEnv
OB1K
RESTful Java Services
Monitoring Your Data
https://memegenerator.net/instance/53617544
Data Questions? What should be measure
● Did all of the computation occur?
○ Are there any data layers missing?● How much data do we have? (Volume)
● Is all of the data in the Database?
● Data Quality Assurance
Conclusions
● Keep all of the Data that you can● In its most raw form
● Duplicating Data is not a bad thing● By demand compute with save you much time and money● Find the relevant tool to solve each problem
● Not one tool that will solve all of them (No such thing)● Use the cloud as an auxiliary tool
● Will boost your productivity by much
Questions?
● LinkedIn● Twitter: @demibenari● Blog: http://
progexc.blogspot.com/● [email protected]
● “Big Things” CommunityMeetup, YouTube, Facebook, Twitter● GDG Cloud