Post on 16-Apr-2017
SPRINGONE2GXWASHINGTON, DC
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Developing Real-Time Data
Pipelines with Apache Kafka
Joe Stein@allthingshadoop
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source software. The Elodina platform enables customers to analyze data streams and programmatically react to the results in real-time. We solve today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. As users, contributors and committers, Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more!
Apache Kafka Committer & PMC Member
LinkedIn: http://linkedin.com/in/charmalloc Twitter : @allthingshadoop
whoami
2
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Contents• Introduction To Kafka
•Overview•Topics, Partitions &
Segments•Data Durability•Replication•Producers•Consumers•Performance•Integration•Quick Start•Operations
3
• Designs• Distributed RPC
o Requesto Processo Response
• Storage & Analyticso Streamo Transformo Analyzeo Storeo Search
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Kafka
4
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache KafkaApache Kafka was first open sourced by LinkedIn in 2011Papers
● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf
● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://kafka.apache.org/
5
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
How Big Data Usually Starts
6
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
More Big Data!
7
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Ah!
8
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
eesh
9
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka de-couples data pipelines
10
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed Replicated LogRead and writeIn real timeAs much as you wantAs fast as your network can go
11
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Topics and Partitions
12
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Log Segments
13
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed Replicated Log
14
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Data Durability
15
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Replication
16
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Producers
17
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumers
18
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumer Failover
19
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Producer Performance
20
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumer Performance
http://kafka.apache.org/documentation.html#maximizingefficiency
21
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Client LibrariesCommunity Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients
● Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.
● Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.
● C - High performance C library with full protocol support● Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy
compression supported. Ruby 1.9.3 and up (CI runs MRI 2.● Clojure - Clojure DSL for the Kafka API● JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
Wire Protocol Developer's Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
22
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Spring IntegrationGood blog about it
https://spring.io/blog/2015/04/15/using-apache-kafka-for-integration-and-data-processing-pipelines-with-spring
Kafka Integration Source https://github.com/spring-projects/spring-integration-kafka
Spring XD sampleshttps://github.com/spring-projects/spring-xd-samples/tree/master/kafka-source
23
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Quick Starthttps://kafka.apache.org/documentation.html#quickstartDownload the 0.8.2.2 release and un-tar it.> tar -xzf kafka_2.10-0.8.2.2.tgz> cd kafka_2.10-0.8.2.2(use at least four terminal windows)> bin/zookeeper-server-start.sh config/zookeeper.properties> bin/kafka-server-start.sh config/server.properties> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testThis is a messageThis is another message> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginningThis is a messageThis is another message
24
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Operationalizing Kafkahttps://kafka.apache.org/documentation.html#basic_ops
Basic Kafka Operations
● Adding and removing topics
● Modifying topics
● Graceful shutdown
● Balancing leadership
● Checking consumer position
● Mirroring data between clusters
● Expanding your cluster
● Decommissioning brokers
● Increasing replication factor
25
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Running on Mesos
26
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Static Partitioning
27
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Scaling is manual (even if orchestrated)
28
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Static failures require manual intervention
29
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Application Elasticity
30
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
An operating system for your data center
31
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Everything goes on Mesos
32
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka on Mesoshttps://github.com/mesos/kafka
● smart broker.id assignment.● preservation of broker placement (through constraints and/or new features).● ability to-do configuration changes.● rolling restarts (for things like configuration changes).● scaling the cluster up and down with automatic, programmatic and manual
options.● smart partition assignment via constraints visa vi roles, resources and
attributes.
33
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka on MesosScheduler
● Provides the operational automation for a Kafka Cluster.
● Manages the changes to the broker's configuration.
● Exposes a REST API for the CLI to use or any other client.
● Runs on Marathon for high availability.
Executor● The executor interacts with the kafka broker as an intermediary to the scheduler
34
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
REST API & CLI● scheduler - starts the scheduler.● add - adds one more more brokers to the cluster.
● update - changes resources, constraints or broker properties one or more brokers.● remove - take a broker out of the cluster.● start - starts a broker up.● stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop)● rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to rebalance. Manual
assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor
on a topic.
● help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}
35
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Launch 20 brokers in seconds
36
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka 0.9KIP (Kafka Improvement Process)
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
New Consumer• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+De
sign
Security • https://cwiki.apache.org/confluence/display/KAFKA/Security• https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=51809888
JIRA• https://issues.apache.org/jira/browse/KAFKA/fixforversion/12328745/?selectedTab=com
.atlassian.jira.jira-projects-plugin:version-issues-panel
37
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed RPC
38
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Reference Architecture
39
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Questions?
http://www.elodina.net
40