DeveloperWeek 2015 - WebRTC - Where to start and how to scale
DeveloperWeek 2020 - files.devnetwork.cloud · Apache Kafka Managed Solutions High performing...
Transcript of DeveloperWeek 2020 - files.devnetwork.cloud · Apache Kafka Managed Solutions High performing...
DeveloperWeek 2020
Who is Instaclustr?
Apache Kafka Managed Solutions
High performing streaming and queuing technology for large-scale, always on applications
SOC 2
Certification
Apache
Zookeeper
Monitoring
Built-in
Run in your cloud
provider or ours
24/7 Expert
Support
Zero
Downtime
Managed
Ecosystem
Intuitive &
Flexible API High Scalabilty
& Reliability
High Throughput
& Availability
Integrations Rich
Ecosystem
Apache Kafka Use Cases
Kafka allows user to build real-time streaming data pipelines.
Kafka can be used to generate matrix, log aggregation, messaging, audit trails
and much more.
A few examples of Use Cases:
● Stream Processing
● Website Activity Tracking
● Log Aggregation
● Metrics Collection and Monitoring
● Network Monitoring
● Internet of Things
● Advertising
● Fraud Detection...
Running Kafka in Production
Cluster Sizing → Take advantage of Kafka’s (low) hardware requirements
Configuring Topics → Be careful with topic configurations
Kafka Redundancy → Set up replication and redundancy the right way
Managing Kafka Logs → Set log configuration parameters to keep logs manageable
Kafka Security → Configure and isolate Kafka with security in mind
Operating System → Avoid outages by raising the Ulimit
Monitoring Kafka → Utilize effective monitoring and alerts
Today’s Agenda
Kafka Review
What is Kafka Best Practices
Cluster Sizing
Take Advantage of Kafka’s (Low) Hardware Requirements
Kafka Redundancy Cluster Sizing
Perform a Load Test
Run kafka-producer-perf-test
Run kafka-consumer-perf-test
Check for Consumer Lag during the Load test
kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe
--group CONSUMER_GROUP --new-consumer
Kafka Redundancy Cluster Sizing Using Formula
Cluster Size Estimation based on network and disk throughput requirements.
W - MB/sec of data that will be written
R - Replication factor
C - Number of consumer groups, that is the number of readers for each write
Kafka is mostly limited by the disk and network throughput.
Kafka Redundancy Cluster Sizing Using Formula
W - MB/sec of data that will be written
R - Replication factor
C - Number of consumer groups, that is the number of readers for
each write
Writes: W * R
Reads: (R+C- 1) * W
Kafka Redundancy Cluster Sizing Using Formula
W - MB/sec of data that will be written
R - Replication factor
C - Number of consumer groups, that is the number of readers for each write
Writes: W * R
Reads: (R+C- 1) * W
L = R + C -1
Based on this, we can calculate our cluster-wide I/O requirements:
Disk Throughput (Read + Write): W * R + L * W
Network Read Throughput: (R + C -1) * W
Network Write Throughput: W * R
Kafka Redundancy Cluster Sizing Using Formula - Example
A single server provides a given disk throughput as well as network throughput.
For example, if you have a 1 Gigabit Ethernet card with full duplex, then that would
give 125 MB/sec read and 125 MB/sec write; likewise 6 7200 SATA drives might
give roughly 300 MB/sec read + write throughput.
Once we know the total requirements, as well as what is provided by one machine,
you can divide to get the total number of machines needed.
Kafka Redundancy Cluster Sizing Using Formula - Example
This gives a machine count running at maximum capacity, assuming no overhead
for network protocols, as well as perfect balance of data and load.
Since there is protocol overhead as well as imbalance, you want to have at least 2x
this ideal capacity to ensure sufficient capacity.
Configuring Topics
Be Careful With Topic Configurations
Configuring Topics Topic, Partition, Replica View
● Topic
○ Logical grouping of data
○ Settings such as replication, num partitions, log
retention, compaction, etc controllable at topic level
● Partition
○ Subset of messages in a topic that:
■ Have a single master broker
■ Guarantee ordered delivery within that subset
■ Within consumer groups, 1 consumer is
assigned to read from each partition
○ Number of partitions is set on topic creation
○ Messages are mapped to partition by key
2
3 4
Broker
Topic
Partition - Master
Partition - Replica
Legend
2 1
3 4
2 1
3 4
2 1
Configuring Topics Topic Overview
● Kafka Topic is a stream of records
● Topics stored in log
● Log broken up into partitions and segments
● Topic is a category or stream name or feed
● Topics are pub/sub
● Can have zero or many subscribers—consumer groups
● Topics are broken up and spread by partitions for speed and size
Configuring Topics Topic Partitions
● Topics are broken up into partitions
● Partitions decided usually by key of record
● Key of record determines which partition
● Partitions are used to scale Kafka across many servers
● Record sent to correct partition by key
● Partitions are used to facilitate parallel consumers
● Records are consumed in parallel up to the number of partitions
● Order guaranteed per partition
● Partitions can be replicated to multiple brokers
Configuring Topics Topic Partition Log
● Order is maintained only in a single partition
○ Partition is ordered, immutable sequence of records that is continually appended to—a structured commit log
● Records in partitions are assigned sequential id number called the offset
● Offset identifies each record within the partition
● Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server
○ Topic partition must fit on servers that host it
○ Topic can span many partitions hosted on many servers
● Topic Partitions are unit of parallelism - a partition can only be used by one consumer in
group at a time
● Consumers can run in their own process or their own thread
● If a consumer stops, Kafka spreads partitions across remaining consumer in group
Configuring Topics Topic Design
● Minimum number of topics is implied by the minimum different retention etc.
settings you require
○ -> you probably don’t want to mix message types with different scalability or latency requirements
● Maximum number of topics is largely limited by imagination
● In between is a set of design trade-offs:
● In general, pick the minimum number of topics that allows for required replication, retention, etc
settings, separates message types with different scale or latency profiles and does not result in
consumers reading excess numbers of extra messages.
LESS TOPICS MORE TOPICS
Consumers may have to filter messages Consumers can read only from topics they care about
Less processing overhead of managing masters and consumers Slower restarts, other processing overheads
Less configuration to manage More flexibility in configuration
Configuring Topics Topic Partition Log
Right Number of Partitions = Optimal Performance
Configuring Topics Topic Partition Calculation Example
Desire to Read 1 GB/sec from a Topic
Each consumer is only able process 50 MB/sec
Therefore → 20 partitions and 20 consumers in the consumer group.
Desire to Write 1 GB/sec to a Topic
Each producer is only able process 100 MB/sec
Therefore → 10 partitions and 10 producers.
Configuring Topics Topic Partition Sizing Formula
#Partitions = max(NP, NC)
where:
NP → is the number of required producers determined by calculating: TT/TP
NC → is the number of required consumers determined by calculating: TT/TC
TT → is the total expected throughput for our system
TP → is the max throughput of a single producer to a single partition
TC → is the max throughput of a single consumer from a single partition
This calculation gives you a rough indication of the number of partitions.
It's a good place to start.
Configuring Topics Top Design Questions to Keep in Mind
● What topics do I need? ○ Are there distinct streams of message types that are require different processing?
○ Are there any different requirements for message retention or resiliency?
○ Would splitting by topic help to reduce consumer load?
For each topic:
● Do I care about ordering? ○ What level (key) is ordering important?
○ Are there sufficient keys to distribute across Kafka partitions?
○ Is the message distribution per-key relatively consistent?
● How many partitions? ○ What is max expect throughput by broker and consumer?
○ Partitions = total target throughput / min (broker throughput, consumer throughput) * buffer factor.
○ Buffer factor dependent on how evenly your keys distribute data to partitions.
Configuring Topics Things to Keep in Mind
Metadata about partitions are stored in ZooKeeper in the form of znodes.
Having a large number of partitions has effects on ZooKeeper and on client resources:
● Unneeded partitions put extra pressure on ZooKeeper (more network requests), and might
introduce delay in controller and/or partition leader election if a broker goes down.
● Producer and consumer clients need more memory, because they need to keep track of
more partitions and also buffer data for all partitions.
● As guideline for optimal performance, you should not have more than 3000 partitions per
broker and not more than 30,000 partitions in a cluster.
More Partitions = Greater CPU Load
More Partitions = Lower Messages Per Second
Conclusion?
10 to 100 partitions per broker is the
sweet spot for the number of
partitions you should run in a cluster
to reach maximum performance
Kafka Redundancy
Set up Replication and Redundancy the Right Way
Kafka Redundancy Replication
Kafka replicates each topic's partitions across a configurable number of Kafka brokers
Each topic partition has one leader and zero or more followers
leaders and followers are called replicas
replication factor = 1 leader + N followers
Reads and writes always go to leader
Partition leadership is evenly shared among Kafka brokers
logs on followers are in-sync to leader's log - identical copy - sans un-replicated offsets
Followers pull records in batches records from leader like a regular Kafka consumer
Kafka Redundancy Broker Failover
Kafka keeps track of which Kafka Brokers are alive (in-sync)
● To be alive Kafka Broker must maintain a ZooKeeper session (heart beat)
● Followers must replicate writes from leader and not fall "too far" behind
Each leader keeps track of set of "in sync replicas" aka ISRs
If ISR/follower dies, falls behind, leader will removes follower from ISR set - falling behind replica.lag.time.max.ms > lag
Kafka guarantee: committed message not lost, as long as one live ISR - "committed" when written to all ISRs logs
Consumer only reads committed messages
Kafka Redundancy Kafka Quorum
Quorum is number of acknowledgements required and number of logs that must be compared to elect a leader such that there is guaranteed to be an overlap.
Most systems use a majority vote - Kafka does not use a majority vote.
Leaders are selected based on having the most complete log.
Problem with majority vote Quorum is it does not take many failure to have inoperable cluster.
Kafka Redundancy Kafka Quorum
If we have a replication factor of 3
Then at least two ISRs must be in-sync before the leader declares a sent
message committed.
If a new leader needs to be elected then, with no more than 3 failures, the new
leader is guaranteed to have all committed messages.
Among the followers there must be at least one replica that contains all committed
messages.
Kafka Redundancy Kafka Quorum - ISR (Insync Replicas)
Kafka maintains a set of ISRs
Only this set of ISRs are eligible for leadership election
Write to partition is not committed until all ISRs ack write
ISRs persisted to ZooKeeper whenever ISR set changes
Kafka Redundancy Kafka Quorum - ISR (Insync Replicas)
Any replica that is member of ISRs are eligible to be elected leader
Allows producers to keep working without majority nodes
Allows a replica to rejoin ISR set
must fully re-sync again
even if replica lost un-flushed data during crash
Managing Kafka Logs
Set Log Configuration Parameters to Keep Logs Manageable
Managing Kafka Logs Overview
Recall Kafka can delete older records based on
● time period
● size of a log
Kafka also supports log compaction for record key compaction
Log compaction: keep latest version of record and delete older versions
Managing Kafka Logs Log Compaction
Log compaction retains last known value for each record key
Useful for restoring state after a crash or system failure, e.g. in memory service, persistent data store, reloading a cache
Data streams is to log changes to keyed, mutable data, e.g. changes to a database table, changes to object in in-memory microservice
Topic log has full snapshot of final values for every key - not just recently changed keys
Downstream consumers can restore state from a log compacted topic
Managing Kafka Logs Log Compaction Structure
Log has head and tail
Head of compacted log identical to a traditional Kafka log
New records get appended to the head
Log compaction works at tail of the log
Tail gets compacted
Records in tail of log retain their original offset when written after compaction
Managing Kafka Logs Log Compaction: Tail vs Head
Managing Kafka Logs Log Compaction Cleaning
Managing Kafka Logs Log Cleaner
What are three ways Kafka can delete records?
What is log compaction good for?
What is the structure of a compacted log? Describe the structure.
After compaction, do log record offsets change?
What is a partition segment?
Kafka Security
Setting Practical Security Policies for Kafka
Kafka Security Levels of Security
In-flight Data encryption:
● The encryption is typically achieved by using SSL/TLS.
○ Inter-broker
○ Client-broker
Authentication:
● This is verifying identity of clients by Kafka cluster. This is achieved with SSL, SASL protocols.
Authorization:
● Kafka Clients have specific access to kafka Topics
● The authorization is achieved by ACL (Access control list).
Kafka Security Security Protocols and Configuration Options
SSL → Encryption and authentication
SASL → Used for Authentication
ACL → Access Control lists is used for Authorization
Operating System
Configuring the Operating System for Kafka
Operating System Open File Descriptors
Linux Ships with 1,024 open file descriptors allowed per process.
Increase your file descriptor count to something large like 100,000.
Monitoring Kafka
Metrics to Watch For
Monitoring Kafka All Metrics to Watch
● Broker Topic Metrics
● Delayed Operations
● Leader Election Rate and Time Ms
● Message Conversions
● Network and Request Handler Capacity
● Partition Metrics
● Per-Topic Metrics
● Replicas
● Request Metrics
● Synthetic Transactions
● CPU Usage
● Disk Usage
Monitoring Kafka Partition Metrics
● Active Controller Count
● Leader Count
● Offline Partitions
● Partition Count
Monitoring Kafka Leader Elections
● Leader Election
● Unclean Leader Elections
Monitoring Kafka Broker Topic Metrics
● Broker Topic Bytes In
● Broker Topic Bytes Out
● Broker Topic Messages In
Monitoring Kafka Delayed Operations
● Fetch Purgatory Size
● Produce Purgatory Size
Monitoring Kafka Message Conversions
● Fetch Message Conversions Per Sec
● Produce Message Conversions Per Sec
Monitoring Kafka Per-Topic Metrics
● Messages In
● Bytes In/Out
● Message Conversions
● Failed Requests
TO MAXIMIZE USE OF THIS METRIC, GROUP BY TOPIC
Monitoring Kafka Per-Topic Metrics
● Messages In
Monitoring Kafka Per-Topic Metrics
● Bytes In/Out
Monitoring Kafka Per-Topic Metrics
● Message Conversions
Monitoring Kafka Per-Topic Metrics
● Failed Requests
Monitoring Kafka Request Metrics
● Fetch Consumer Request Total Time
● Fetch Follower Request Total Time
● Produce Request Total Time
Zeke Dean