Fault Tolerance with Kafka

21
www.edureka.co/apache-kafka Fault Tolerance with Kafka

Transcript of Fault Tolerance with Kafka

Page 1: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Fault Tolerance with Kafka

Page 2: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

What will you learn today ?

What is Apache Kafka?

Architecture of Kafka

How Kafka achieves Fault Tolerance?

Hands-On : Fault Tolerance with Kafka

Page 3: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Data : The Ingredient

Data is the main ingredient of Internet applications and typically includes the following :

Page visits and clicks

User activities

Events corresponding to logins

Social networking activities such as likes, shares, and comments

Application specific metrics (e.g. logs, page load time, performance etc.)

Page 4: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Need : Real Time Analytics

In todays applications, activity data has become a part of production data and is used to run analytics in real time. These analytics can be:

Delivering advertisements to the masses

Tracking any abnormal user behavior or application hacking

Search-based on relevance

Recommendations based on popularity

Page 5: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Messaging Systems

Messaging systems provide seamless integration among distributed applications with the help of messages, that are shared between them

In the present big-data era, the very first challenge is to collect the data as it is a huge and the second challenge is to analyze it, one way to solve this problem is by using messaging systems

Problem :

Solution :

Page 6: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Apache Kafka

Apache Kafka is a distributed publish-subscribe messaging system

Originally developed at LinkedIn and later on became a part of Apache project

Kafka is fast, scalable, durable and distributed by design

Page 7: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Kafka Architecture

Producer

ConsumerConsumerConsumer

Producer Producer

Kafka Cluster

A stream of messages of particular category is called a topic. Producers publish messages to a topic

A Producer can be any application who can publish messages to a topic

Consumers subscribe to topics and consume the messages

Kafka cluster is a set of servers, each of which is called a broker

Kafka Architecture

Page 8: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

ZooKeeper and Kafka

Each Kafka broker coordinates with other Kafka brokers using ZooKeeper

Producers and Consumers are notified by ZooKeeper service about the presence of new broker in Kafka system or failure of the broker in Kafka system

Page 9: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Kafka Clusters

With Kafka we can create multiple types of clusters, such as the following :

Single node single broker cluster

Single node multiple broker cluster

Multiple nodes multiple broker cluster

Page 10: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Single Node Single Broker Cluster

Producer

Producer

Producer

Consumer

Consumer

Consumer

Kafka Broker

ZooKeeper

Single Node Single Broker Cluster

Page 11: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Single Node Multiple Broker Cluster

Producer

Producer

Producer

Consumer

Consumer

Consumer

ZooKeeper

Single Node Multiple Broker Cluster

Broker 1

Broker 2

Broker 3

Page 12: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Multiple Node Multiple Broker Cluster

Producer

Producer

Producer

Consumer

Consumer

Consumer

ZooKeeper

Multiple Node Multiple Broker Cluster

Broker 1

Broker 2

Broker 1

Broker 2

Node 1

Node 2

Page 13: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

How Kafka achieves Fault Tolerance?

For each topic, the Kafka cluster maintains a partitioned log that looks as shown below:

Each partition is an ordered, immutable sequence of messages that is continually appended to a commit log

Page 14: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

How Kafka achieves Fault Tolerance?

The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. So Kafka achieves fault tolerance by replicating each partition over a number of servers

Page 15: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Hands-onFault Tolerance with Kafka

Page 16: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Kafka @ LinkedIn

LinkedIn Newsfeed is powered by Kafka

LinkedIn recommendations are powered by Kafka

Page 17: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Kafka @ LinkedIn

LinkedIn notifications are powered by Kafka

Apart from this LinkedIn uses Kafka for many other purposes like log monitoring, performance metrics, search improvement etc.

Page 18: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Who else uses Kafka ?

DataSift uses Kafka as a collector of monitoring events and to track user’s

consumption of data streams in real time

Wooga uses Kafka to aggregate and process tracking data from all their

facebook games (hosted at various providers) in a central location

Spongecell uses Kafka to run their entire analytics and monitoring pipeline

driving both real-time and ETL applications

Loggly is the world's most popular cloud-based log management. It uses

Kafka for log collection

An exhaustive list of companies using Kafka can be found here : https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

Page 19: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

References

Apache Kafka :

http://kafka.apache.org/

Kafka Papers :

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

Powered by Kafka :

https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

LinkedIn Performance Insights :

https://engineering.linkedin.com/samza/real-time-insights-linkedins-performance-using-apache-samza

Page 20: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Survey

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

Page 21: Fault Tolerance with Kafka

www.edureka.co/apache-kafka

Thank You …

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours