Kafka and Storm A Gentle Introduction of - Percona integrators A Gentle... · A Gentle Introduction...

22
A Gentle Introduction of Kafka and Storm Drew Nelson {Percona University | Raleigh} {Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Transcript of Kafka and Storm A Gentle Introduction of - Percona integrators A Gentle... · A Gentle Introduction...

A Gentle Introduction of Kafka and Storm

Drew Nelson

{Percona University | Raleigh}

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Open Software Integrators• Open Software Integrators is a Big Data consulting and services

company specializing in Hadoop, Cassandra, MongoDB and other NoSQL technologies. OSI focuses on executive strategy, initial install, design and implementation.

• Founded January 2008 by Andrew C. Oliver

• Based in downtown Durham, NC

• Partnered with Hortonworks, MongoDB, DataStax, Cloudera, Couchbase, Cloudbees & Neo Technology

Kafka and Storm

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

A Gentle Introduction• What Kafka and Storm are?• What they can be used for?• What they excel at?

Kafka and Storm

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka

Kafka and Storm

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

What is Apache Kafka?

Kafka is a distributed, partitioned, replicated commit log service.

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

The Commit Log

An append-only, immutable sequence of records ordered by time.

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

firstrecord

next writtenrecord

Kafka and Storm

Kafka is:

● fast● durable● distributed● scalable

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Kafka and Storm

Kafka abstractions

● Topic: feeds of messages in categories● Broker: a host running Kafka● Producer: a process that publishes messages● Consumer: a process that pulls messages● Partition: portion of a topic’s stream of messages

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Kafka and Storm

What Kafka is used for:

Enterprise-grade event streaming

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Kafka and Storm

What Kafka is not good at:

Doing anything other than being a commit log.

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Storm

Kafka and Storm

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

What is Apache Storm?

Storm is a distributed, real time computation system

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

Stream processing

● AKA Event Sourcing ● Command and Query Responsibility Segregation● Complex Event Processing● etc.

Several process fail into the domain of stream processing.

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

What Storm does

● Simple API● Guaranteed data processing● Fault tolerant● Scalable● Usable with any language

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

Storm abstractions

Three abstractions:● Spouts● Bolts● Topology

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

SpoutSpout

BoltBoltBolt

Bolt

Kafka and Storm

Storm processes

Processes:● UI● Nimbus● Supervisor● Worker

Drew Nelson

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Supervisor

Worker

Worker

Supervisor

Worker

Worker

Zookeeper

Web UI Nimbus

Kafka and Storm

Storm parallelism model

● Worker process● Executors● Tasks

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Use Case: Security

Kafka and Storm

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

Use Case: Security

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Security customer analytics platform ● Pulling data from customer sites, ● Placed data in a SQL database ● Performing analysis to spot anomalous traffic ● Pushing results back to client to blocking traffic

sources

Kafka and Storm

Use Case: Security

Original system mean turn around time: 4.5 hoursStorm / Kafka solution, maximum processing time:

2.6 seconds

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson

Thank You

Kafka and Storm

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Kafka and Storm

Links

Kafka: http://kafka.apache.org/Storm: http://storm.apache.org/

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Drew Nelson