www.edureka.co/apache-kafka
Fault Tolerance with Kafka
www.edureka.co/apache-kafka
What will you learn today ?
What is Apache Kafka?
Architecture of Kafka
How Kafka achieves Fault Tolerance?
Hands-On : Fault Tolerance with Kafka
www.edureka.co/apache-kafka
Data : The Ingredient
Data is the main ingredient of Internet applications and typically includes the following :
Page visits and clicks
User activities
Events corresponding to logins
Social networking activities such as likes, shares, and comments
Application specific metrics (e.g. logs, page load time, performance etc.)
www.edureka.co/apache-kafka
Need : Real Time Analytics
In todays applications, activity data has become a part of production data and is used to run analytics in real time. These analytics can be:
Delivering advertisements to the masses
Tracking any abnormal user behavior or application hacking
Search-based on relevance
Recommendations based on popularity
www.edureka.co/apache-kafka
Messaging Systems
Messaging systems provide seamless integration among distributed applications with the help of messages, that are shared between them
In the present big-data era, the very first challenge is to collect the data as it is a huge and the second challenge is to analyze it, one way to solve this problem is by using messaging systems
Problem :
Solution :
www.edureka.co/apache-kafka
Apache Kafka
Apache Kafka is a distributed publish-subscribe messaging system
Originally developed at LinkedIn and later on became a part of Apache project
Kafka is fast, scalable, durable and distributed by design
www.edureka.co/apache-kafka
Kafka Architecture
Producer
ConsumerConsumerConsumer
Producer Producer
Kafka Cluster
A stream of messages of particular category is called a topic. Producers publish messages to a topic
A Producer can be any application who can publish messages to a topic
Consumers subscribe to topics and consume the messages
Kafka cluster is a set of servers, each of which is called a broker
Kafka Architecture
www.edureka.co/apache-kafka
ZooKeeper and Kafka
Each Kafka broker coordinates with other Kafka brokers using ZooKeeper
Producers and Consumers are notified by ZooKeeper service about the presence of new broker in Kafka system or failure of the broker in Kafka system
www.edureka.co/apache-kafka
Kafka Clusters
With Kafka we can create multiple types of clusters, such as the following :
Single node single broker cluster
Single node multiple broker cluster
Multiple nodes multiple broker cluster
www.edureka.co/apache-kafka
Single Node Single Broker Cluster
Producer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Broker
ZooKeeper
Single Node Single Broker Cluster
www.edureka.co/apache-kafka
Single Node Multiple Broker Cluster
Producer
Producer
Producer
Consumer
Consumer
Consumer
ZooKeeper
Single Node Multiple Broker Cluster
Broker 1
Broker 2
Broker 3
www.edureka.co/apache-kafka
Multiple Node Multiple Broker Cluster
Producer
Producer
Producer
Consumer
Consumer
Consumer
ZooKeeper
Multiple Node Multiple Broker Cluster
Broker 1
Broker 2
Broker 1
Broker 2
Node 1
Node 2
www.edureka.co/apache-kafka
How Kafka achieves Fault Tolerance?
For each topic, the Kafka cluster maintains a partitioned log that looks as shown below:
Each partition is an ordered, immutable sequence of messages that is continually appended to a commit log
www.edureka.co/apache-kafka
How Kafka achieves Fault Tolerance?
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. So Kafka achieves fault tolerance by replicating each partition over a number of servers
www.edureka.co/apache-kafka
Hands-onFault Tolerance with Kafka
www.edureka.co/apache-kafka
Kafka @ LinkedIn
LinkedIn Newsfeed is powered by Kafka
LinkedIn recommendations are powered by Kafka
www.edureka.co/apache-kafka
Kafka @ LinkedIn
LinkedIn notifications are powered by Kafka
Apart from this LinkedIn uses Kafka for many other purposes like log monitoring, performance metrics, search improvement etc.
www.edureka.co/apache-kafka
Who else uses Kafka ?
DataSift uses Kafka as a collector of monitoring events and to track user’s
consumption of data streams in real time
Wooga uses Kafka to aggregate and process tracking data from all their
facebook games (hosted at various providers) in a central location
Spongecell uses Kafka to run their entire analytics and monitoring pipeline
driving both real-time and ETL applications
Loggly is the world's most popular cloud-based log management. It uses
Kafka for log collection
An exhaustive list of companies using Kafka can be found here : https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
www.edureka.co/apache-kafka
References
Apache Kafka :
http://kafka.apache.org/
Kafka Papers :
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
Powered by Kafka :
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
LinkedIn Performance Insights :
https://engineering.linkedin.com/samza/real-time-insights-linkedins-performance-using-apache-samza
www.edureka.co/apache-kafka
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/apache-kafka
Thank You …
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours
Top Related