DeveloperWeek 2020 - files.devnetwork.cloud · Apache Kafka Managed Solutions High performing...

DeveloperWeek 2020

Who is Instaclustr?

Apache Kafka Managed Solutions

High performing streaming and queuing technology for large-scale, always on applications

SOC 2

Certification

Apache

Zookeeper

Monitoring

Built-in

Run in your cloud

provider or ours

24/7 Expert

Support

Zero

Downtime

Managed

Ecosystem

Intuitive &

Flexible API High Scalabilty

& Reliability

High Throughput

& Availability

Integrations Rich

Ecosystem

Apache Kafka Use Cases

Kafka allows user to build real-time streaming data pipelines.

Kafka can be used to generate matrix, log aggregation, messaging, audit trails

and much more.

A few examples of Use Cases:

● Stream Processing

● Website Activity Tracking

● Log Aggregation

● Metrics Collection and Monitoring

● Network Monitoring

● Internet of Things

● Advertising

● Fraud Detection...

Running Kafka in Production

Cluster Sizing → Take advantage of Kafka’s (low) hardware requirements

Configuring Topics → Be careful with topic configurations

Kafka Redundancy → Set up replication and redundancy the right way

Managing Kafka Logs → Set log configuration parameters to keep logs manageable

Kafka Security → Configure and isolate Kafka with security in mind

Operating System → Avoid outages by raising the Ulimit

Monitoring Kafka → Utilize effective monitoring and alerts

Today’s Agenda

Kafka Review

What is Kafka Best Practices

Cluster Sizing

Take Advantage of Kafka’s (Low) Hardware Requirements

Kafka Redundancy Cluster Sizing

Perform a Load Test

Run kafka-producer-perf-test

Run kafka-consumer-perf-test

Check for Consumer Lag during the Load test

kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe

--group CONSUMER_GROUP --new-consumer

Kafka Redundancy Cluster Sizing Using Formula

Cluster Size Estimation based on network and disk throughput requirements.

W - MB/sec of data that will be written

R - Replication factor

C - Number of consumer groups, that is the number of readers for each write

Kafka is mostly limited by the disk and network throughput.




C - Number of consumer groups, that is the number of readers for

each write

Writes: W * R

Reads: (R+C- 1) * W




C - Number of consumer groups, that is the number of readers for each write

Writes: W * R

Reads: (R+C- 1) * W

L = R + C -1

Based on this, we can calculate our cluster-wide I/O requirements:

Disk Throughput (Read + Write): W * R + L * W

Network Read Throughput: (R + C -1) * W

Network Write Throughput: W * R

Kafka Redundancy Cluster Sizing Using Formula - Example

A single server provides a given disk throughput as well as network throughput.

For example, if you have a 1 Gigabit Ethernet card with full duplex, then that would

give 125 MB/sec read and 125 MB/sec write; likewise 6 7200 SATA drives might

give roughly 300 MB/sec read + write throughput.

Once we know the total requirements, as well as what is provided by one machine,

you can divide to get the total number of machines needed.

Kafka Redundancy Cluster Sizing Using Formula - Example

This gives a machine count running at maximum capacity, assuming no overhead

for network protocols, as well as perfect balance of data and load.

Since there is protocol overhead as well as imbalance, you want to have at least 2x

this ideal capacity to ensure sufficient capacity.

Configuring Topics

Be Careful With Topic Configurations

Configuring Topics Topic, Partition, Replica View

● Topic

○ Logical grouping of data

○ Settings such as replication, num partitions, log

retention, compaction, etc controllable at topic level

● Partition

○ Subset of messages in a topic that:

■ Have a single master broker

■ Guarantee ordered delivery within that subset

■ Within consumer groups, 1 consumer is

assigned to read from each partition

○ Number of partitions is set on topic creation

○ Messages are mapped to partition by key

2

3 4

Broker

Topic

Partition - Master

Partition - Replica

Legend

2 1

3 4

2 1

3 4

2 1

Configuring Topics Topic Overview

● Kafka Topic is a stream of records

● Topics stored in log

● Log broken up into partitions and segments

● Topic is a category or stream name or feed

● Topics are pub/sub

● Can have zero or many subscribers—consumer groups

● Topics are broken up and spread by partitions for speed and size

Configuring Topics Topic Partitions

● Topics are broken up into partitions

● Partitions decided usually by key of record

● Key of record determines which partition

● Partitions are used to scale Kafka across many servers

● Record sent to correct partition by key

● Partitions are used to facilitate parallel consumers

● Records are consumed in parallel up to the number of partitions

● Order guaranteed per partition

● Partitions can be replicated to multiple brokers

Configuring Topics Topic Partition Log

● Order is maintained only in a single partition

○ Partition is ordered, immutable sequence of records that is continually appended to—a structured commit log

● Records in partitions are assigned sequential id number called the offset

● Offset identifies each record within the partition

● Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server

○ Topic partition must fit on servers that host it

○ Topic can span many partitions hosted on many servers

● Topic Partitions are unit of parallelism - a partition can only be used by one consumer in

group at a time

● Consumers can run in their own process or their own thread

● If a consumer stops, Kafka spreads partitions across remaining consumer in group

Configuring Topics Topic Design

● Minimum number of topics is implied by the minimum different retention etc.

settings you require

○ -> you probably don’t want to mix message types with different scalability or latency requirements

● Maximum number of topics is largely limited by imagination

● In between is a set of design trade-offs:

● In general, pick the minimum number of topics that allows for required replication, retention, etc

settings, separates message types with different scale or latency profiles and does not result in

consumers reading excess numbers of extra messages.

LESS TOPICS MORE TOPICS

Consumers may have to filter messages Consumers can read only from topics they care about

Less processing overhead of managing masters and consumers Slower restarts, other processing overheads

Less configuration to manage More flexibility in configuration

Configuring Topics Topic Partition Log

Right Number of Partitions = Optimal Performance

Configuring Topics Topic Partition Calculation Example

Desire to Read 1 GB/sec from a Topic

Each consumer is only able process 50 MB/sec

Therefore → 20 partitions and 20 consumers in the consumer group.

Desire to Write 1 GB/sec to a Topic

Each producer is only able process 100 MB/sec

Therefore → 10 partitions and 10 producers.

Configuring Topics Topic Partition Sizing Formula

#Partitions = max(NP, NC)

where:

NP → is the number of required producers determined by calculating: TT/TP

NC → is the number of required consumers determined by calculating: TT/TC

TT → is the total expected throughput for our system

TP → is the max throughput of a single producer to a single partition

TC → is the max throughput of a single consumer from a single partition

This calculation gives you a rough indication of the number of partitions.

It's a good place to start.

Configuring Topics Top Design Questions to Keep in Mind

● What topics do I need? ○ Are there distinct streams of message types that are require different processing?

○ Are there any different requirements for message retention or resiliency?

○ Would splitting by topic help to reduce consumer load?

For each topic:

● Do I care about ordering? ○ What level (key) is ordering important?

○ Are there sufficient keys to distribute across Kafka partitions?

○ Is the message distribution per-key relatively consistent?

● How many partitions? ○ What is max expect throughput by broker and consumer?

○ Partitions = total target throughput / min (broker throughput, consumer throughput) * buffer factor.

○ Buffer factor dependent on how evenly your keys distribute data to partitions.

Configuring Topics Things to Keep in Mind

Metadata about partitions are stored in ZooKeeper in the form of znodes.

Having a large number of partitions has effects on ZooKeeper and on client resources:

● Unneeded partitions put extra pressure on ZooKeeper (more network requests), and might

introduce delay in controller and/or partition leader election if a broker goes down.

● Producer and consumer clients need more memory, because they need to keep track of

more partitions and also buffer data for all partitions.

● As guideline for optimal performance, you should not have more than 3000 partitions per

broker and not more than 30,000 partitions in a cluster.

More Partitions = Greater CPU Load

More Partitions = Lower Messages Per Second

Conclusion?

10 to 100 partitions per broker is the

sweet spot for the number of

partitions you should run in a cluster

to reach maximum performance

Kafka Redundancy

Set up Replication and Redundancy the Right Way

Kafka Redundancy Replication

Kafka replicates each topic's partitions across a configurable number of Kafka brokers

Each topic partition has one leader and zero or more followers

leaders and followers are called replicas

replication factor = 1 leader + N followers

Reads and writes always go to leader

Partition leadership is evenly shared among Kafka brokers

logs on followers are in-sync to leader's log - identical copy - sans un-replicated offsets

Followers pull records in batches records from leader like a regular Kafka consumer

Kafka Redundancy Broker Failover

Kafka keeps track of which Kafka Brokers are alive (in-sync)

● To be alive Kafka Broker must maintain a ZooKeeper session (heart beat)

● Followers must replicate writes from leader and not fall "too far" behind

Each leader keeps track of set of "in sync replicas" aka ISRs

If ISR/follower dies, falls behind, leader will removes follower from ISR set - falling behind replica.lag.time.max.ms > lag

Kafka guarantee: committed message not lost, as long as one live ISR - "committed" when written to all ISRs logs

Consumer only reads committed messages

Kafka Redundancy Kafka Quorum

Quorum is number of acknowledgements required and number of logs that must be compared to elect a leader such that there is guaranteed to be an overlap.

Most systems use a majority vote - Kafka does not use a majority vote.

Leaders are selected based on having the most complete log.

Problem with majority vote Quorum is it does not take many failure to have inoperable cluster.

Kafka Redundancy Kafka Quorum

If we have a replication factor of 3

Then at least two ISRs must be in-sync before the leader declares a sent

message committed.

If a new leader needs to be elected then, with no more than 3 failures, the new

leader is guaranteed to have all committed messages.

Among the followers there must be at least one replica that contains all committed

messages.

Kafka Redundancy Kafka Quorum - ISR (Insync Replicas)

Kafka maintains a set of ISRs

Only this set of ISRs are eligible for leadership election

Write to partition is not committed until all ISRs ack write

ISRs persisted to ZooKeeper whenever ISR set changes

Kafka Redundancy Kafka Quorum - ISR (Insync Replicas)

Any replica that is member of ISRs are eligible to be elected leader

Allows producers to keep working without majority nodes

Allows a replica to rejoin ISR set

must fully re-sync again

even if replica lost un-flushed data during crash

Managing Kafka Logs

Set Log Configuration Parameters to Keep Logs Manageable

Managing Kafka Logs Overview

Recall Kafka can delete older records based on

● time period

● size of a log

Kafka also supports log compaction for record key compaction

Log compaction: keep latest version of record and delete older versions

Managing Kafka Logs Log Compaction

Log compaction retains last known value for each record key

Useful for restoring state after a crash or system failure, e.g. in memory service, persistent data store, reloading a cache

Data streams is to log changes to keyed, mutable data, e.g. changes to a database table, changes to object in in-memory microservice

Topic log has full snapshot of final values for every key - not just recently changed keys

Downstream consumers can restore state from a log compacted topic

Managing Kafka Logs Log Compaction Structure

Log has head and tail

Head of compacted log identical to a traditional Kafka log

New records get appended to the head

Log compaction works at tail of the log

Tail gets compacted

Records in tail of log retain their original offset when written after compaction

Managing Kafka Logs Log Compaction: Tail vs Head

Managing Kafka Logs Log Compaction Cleaning

Managing Kafka Logs Log Cleaner

What are three ways Kafka can delete records?

What is log compaction good for?

What is the structure of a compacted log? Describe the structure.

After compaction, do log record offsets change?

What is a partition segment?

Kafka Security

Setting Practical Security Policies for Kafka

Kafka Security Levels of Security

In-flight Data encryption:

● The encryption is typically achieved by using SSL/TLS.

○ Inter-broker

○ Client-broker

Authentication:

● This is verifying identity of clients by Kafka cluster. This is achieved with SSL, SASL protocols.

Authorization:

● Kafka Clients have specific access to kafka Topics

● The authorization is achieved by ACL (Access control list).

Kafka Security Security Protocols and Configuration Options

SSL → Encryption and authentication

SASL → Used for Authentication

ACL → Access Control lists is used for Authorization

Operating System

Configuring the Operating System for Kafka

Operating System Open File Descriptors

Linux Ships with 1,024 open file descriptors allowed per process.

Increase your file descriptor count to something large like 100,000.

Monitoring Kafka

Metrics to Watch For

Monitoring Kafka All Metrics to Watch

● Broker Topic Metrics

● Delayed Operations

● Leader Election Rate and Time Ms

● Message Conversions

● Network and Request Handler Capacity

● Partition Metrics

● Per-Topic Metrics

● Replicas

● Request Metrics

● Synthetic Transactions

● CPU Usage

● Disk Usage

Monitoring Kafka Partition Metrics

● Active Controller Count

● Leader Count

● Offline Partitions

● Partition Count

Monitoring Kafka Leader Elections

● Leader Election

● Unclean Leader Elections

Monitoring Kafka Broker Topic Metrics

● Broker Topic Bytes In

● Broker Topic Bytes Out

● Broker Topic Messages In

Monitoring Kafka Delayed Operations

● Fetch Purgatory Size

● Produce Purgatory Size

Monitoring Kafka Message Conversions

● Fetch Message Conversions Per Sec

● Produce Message Conversions Per Sec

Monitoring Kafka Per-Topic Metrics

● Messages In

● Bytes In/Out


● Failed Requests

TO MAXIMIZE USE OF THIS METRIC, GROUP BY TOPIC


● Messages In


● Bytes In/Out


● Failed Requests

Monitoring Kafka Request Metrics

● Fetch Consumer Request Total Time

● Fetch Follower Request Total Time

● Produce Request Total Time

Zeke Dean

[email protected]

DeveloperWeek 2020 - files.devnetwork.cloud · Apache Kafka Managed Solutions High performing...

Documents

Transcript of DeveloperWeek 2020 - files.devnetwork.cloud · Apache Kafka Managed Solutions High performing...