Post on 12-Apr-2017
The Leader in Big Data Consulting
www.mammothdata.com | @mammothdataco
A Gentle Introduction of Kafka and Storm
{Percona University | Raleigh}
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Open Software Integrators
Open Software Integrators is a Big Data consulting and services company specializing in Hadoop, Cassandra, MongoDB and other NoSQL technologies. OSI focuses on executive strategy, initial install, design and implementation.
Founded January 2008 by Andrew C. Oliver
Based in downtown Durham, NC
Partnered with Hortonworks, MongoDB, DataStax, Cloudera, Couchbase, Cloudbees & Neo Technology
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
A Gentle Introduction
What Kafka and Storm are?What they can be used for?What they excel at?
www.mammothdata.com | @mammothdataco
Kafka
Kafka and Storm
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What is Apache Kafka?
Kafka is a distributed, partitioned, replicated commit log service.
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
The Commit Log
An append-only, immutable sequence of records ordered by time.
firstrecord
next writtenrecord
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Kafka is:
● fast● durable● distributed● scalable
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Kafka Abstractions
● Topic: feeds of messages in categories● Broker: a host running Kafka● Producer: a process that publishes messages● Consumer: a process that pulls messages● Partition: portion of a topic’s stream of messages
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What Kafka is used for:
Enterprise-grade event streaming
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What Kafka is not good at:
Doing anything other than being a commit log.
www.mammothdata.com | @mammothdataco
Storm
Kafka and Storm
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What is Apache Storm?
Storm is a distributed, real time computation system
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Stream Processing
● AKA Event Sourcing ● Command and Query Responsibility Segregation● Complex Event Processing● etc.
Several process fail into the domain of stream processing.
www.mammothdata.com | @mammothdataco
● Simple API● Guaranteed data processing● Fault tolerant● Scalable● Usable with any language
What Storm Does
www.mammothdata.com | @mammothdataco
Three abstractions:● Spouts● Bolts● Topology
Storm Abstractions
SpoutSpout
BoltBoltBolt
Bolt
www.mammothdata.com | @mammothdataco
Processes:● UI● Nimbus● Supervisor● Worker
Storm Processes
SupervisorWorker
Worker
SupervisorWorker
Worker
Zookeeper
Web UI Nimbus
www.mammothdata.com | @mammothdataco
● Worker process● Executors● Tasks
Storm Parallelism Model
www.mammothdata.com | @mammothdataco
Use Case: Security
Kafka and Storm
www.mammothdata.com | @mammothdataco
Security customer analytics platform ● Pulling data from customer sites, ● Placed data in a SQL database ● Performing analysis to spot anomalous traffic ● Pushing results back to client to blocking traffic sources
Use Case: Security
www.mammothdata.com | @mammothdataco
Original system mean turn around time: 4.5 hoursStorm / Kafka solution, maximum processing time:
2.6 seconds
Use Case: Security
www.mammothdata.com | @mammothdataco
Thank You
Kafka and Storm
www.mammothdata.com | @mammothdataco
Kafka: http://kafka.apache.org/Storm: http://storm.apache.org/
Links