Kafka at Peak Performance
-
Upload
todd-palino -
Category
Engineering
-
view
944 -
download
1
Transcript of Kafka at Peak Performance
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka at Peak Performance
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Todd PalinoStaff Site Reliability EngineerLinkedIn, Data Infrastructure Streaming
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 3
Who Am I?
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 4
Kafka At LinkedIn
1100+ Kafka brokers Over 32,000 topics 350,000+ Partitions
875 Billion messages per day 185 Terabytes In 675 Terabytes Out
Peak Load (whole site)– 10.5 Million messages/sec– 18.5 Gigabits/sec Inbound– 70.5 Gigabits/sec Outbound
1800+ Kafka brokers Over 79,000 topics 1,130,000+ Partitions
1.3 Trillion messages per day 330 Terabytes In 1.2 Petabytes Out
Peak Load (single cluster)– 2 Million messages/sec– 4.7 Gigabits/sec Inbound– 15 Gigabits/sec Outbound
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 5
What Will We Talk About?
Picking Your Hardware
Monitoring the Cluster
Triaging Broker Performance Problems
Conclusion
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 6
Hardware Selection
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 7
What’s Important To You?
Message Retention - Disk size
Message Throughput - Network capacity
Producer Performance - Disk I/O
Consumer Performance - Memory
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 8
Go Wide
Kafka is well-suited to horizontal scaling
RAIS - Redundant Array of Inexpensive Servers
Also helps with CPU utilization– Kafka needs to decompress and recompress every message batch– KIP-31 will help with this by eliminating recompression
Don’t co-locate Kafka
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 9
Disk Layout
RAID– Can survive a single disk failure (not RAID 0)– Provides the broker with a single log directory– Eats up disk I/O
JBOD– Gives Kafka all the disk I/O available– Broker is not smart about balancing partitions– If one disk fails, the entire broker stops
Amazon EBS performance works!
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 10
Operating System Tuning
Filesystem Options– EXT or XFS– Using unsafe mount options
Virtual Memory– Swappiness– Dirty Pages
Networking
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 11
Java
Only use JDK 8 now
Keep heap size small– Even our largest brokers use a 6 GB heap– Save the rest for page cache
Garbage Collection - G1 all the way– Basic tuning only– Watch for humongous allocations
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 12
How Much Do You Need?
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 13
Buy The Book!
Early Access available now.
Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting.
Also discusses stream processing and other use cases.
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 14
Kafka Cluster Sizing
How big for your local cluster?– How much disk space do you have?– How much network bandwidth do you have?– CPU, memory, disk I/O
How big for your aggregate cluster?– In general, multiple the number of brokers by the number of local clusters– May have additional concerns with lots of consumers
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 15
Topic Configuration
Partition Counts for Local– Many theories on how to do this correctly, but the answer is “it depends”– How many consumers do you have?– Do you have specific partition requirements?– Keeping partition sizes manageable
Partition Counts for Aggregate– Multiply the number of partitions in a local cluster by the number of local clusters– Periodically review partition counts in all clusters
Message Retention– If aggregate is where you really need the messages, only retain it in local for long enough
to cover mirror maker problems
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 16
Possible Broker Improvements
Namespaces– Namespace topics by datacenter– Eliminate local clusters and just have aggregate– Significant hardware savings
JBOD Fixes– Intelligent partition assignment– Admin tools to move partitions between mount points– Broker should not fail completely with a single disk failure
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 17
Administrative Improvements
Multiple cluster management– Topic management across clusters– Visualization of mirror maker paths
Better client monitoring– Burrow for consumer monitoring– No open source solution for producer monitoring (audit)
End-to-end availability monitoring
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 18
Keeping An Eye On Things
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 19
Monitoring The Foundation
CPU Load
Network inbound and outbound
Filehandle usage for Kafka
Disk– Free space - where you write logs, and where Kafka stores messages– Free inodes– I/O performance - at least average wait and percent utilization
Garbage Collection
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 20
Broker Ground Rules
Tuning– Stick (mostly) with the defaults– Set default cluster retention as appropriate– Default partition count should be at least the number of brokers
Monitoring– Watch the right things– Don’t try to alert on everything
Triage and Resolution– Solve problems, don’t mask them
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 21
Too Much Information!
Monitoring teams hate Kafka– Per-Topic metrics– Per-Partition metrics– Per-Client metrics
Capture as much as you can– Many metrics are useful while triaging an issue
Clients want metrics on their own topics
Only alert on what is needed to signal a problem
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 22
Broker Monitoring
Bytes In and Out, Messages In– Why not messages out?
Partitions– Count and Leader Count– Under Replicated and Offline
Threads– Network pool, Request pool– Max Dirty Percent
Requests– Rates and times - total, queue, local, and send
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 23
Topic Monitoring
Bytes In, Bytes Out Messages In, Produce Rate, Produce Failure Rate Fetch Rate, Fetch Failure Rate
Partition Bytes Log End Offset
– Why bother?– KIP-32 will make this unnecessary
Quota Throttling
Provide this to your customers for them to alert on
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 24
Client Monitoring
For consumers, use Burrow– Monitor all partitions for all consumers– Provides an easy to digest “good, warning, bad” state, with detail available– Fast and free
Producers are a little harder– Several internal implementations of message auditing– The community needs a good open source standard
Cluster availability monitoring– kafka-monitoring is coming soon from LinkedIn!
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 25
It’s Broken! Now What?
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 26
All The Best Ops People…
Know more of what is happening than their customers
Are proactive
Fix bugs, not work around them
This applies to our developers too!
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 27
Anticipating Trouble
Trend cluster utilization and growth over time
Use default configurations for quotas and retention to require customers to talk to you
Monitor request times– If you are able to develop a consistent baseline, this is early warning
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 28
Under Replicated Partitions
Count of number of partitions which are not fully replicated within the cluster
Also referred to as “replica lag”
Primary indicator of problems within the cluster
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 29
Broker Performance Checks
Are you still running 0.8? Are all the brokers in the cluster working? Are the network interfaces saturated?
– Reelect partition leaders– Rebalance partitions in the cluster– Spread out traffic more (increase partitions or brokers)
Is the CPU utilization high? (especially iowait)– Is another process competing for resources?– Look for a bad disk
Do you have really big messages?
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 30
Kafka’s OK, Now What?
If Kafka is working properly, it’s probably a client issue– Don’t throw it over the fence. Help your customers understand
Common producer issues– Batch size and linger time– Receive and send buffers– Sync vs. async, and acknowledgements
Common consumer issues– Garbage collection problems– Min fetch bytes and max wait time– Not enough partitions
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 31
Conclusion
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 32
One Ecosystem
Kafka can scale to millions of messages per second, and more– Operations must scale the cluster appropriately– Developers must use the right tuning and go parallel
Few problems are owned by only one side– Expanding partitions often requires coordination– Applications that need higher reliability drive cluster configurations
Either we work together, or we fail separately
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 33
Would You Like To Know More?
Presentations: http://www.slideshare.net/toddpalino– More Datacenters, More Problems– Kafka As A Service– Always download the originals for slide notes!
Blog Posts: https://engineering.linkedin.com/blog– Development and SRE blogs on Kafka and other topics
LinkedIn Open Source: https://github.com/linkedin/streaming– Burrow Consumer Monitoring - https://github.com/linkedin/Burrow– Kafka Admin Tools - https://github.com/linkedin/kafka-tools
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 34
Getting Involved With Kafka
http://kafka.apache.org
Join the mailing lists– [email protected]– [email protected]
irc.freenode.net - #apache-kafka
Meetups– Apache Kafka - http://www.meetup.com/http-kafka-apache-org– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/
Contribute code
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 35
Data @ LinkedIn is Hiring!
Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)
LinkedIn– Strong commitment to open source– Do cool things and work with awesome people
Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 37
Appendix
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 38
JDK Options
Heap Size -Xmx6g -Xms6g
Metaspace -XX:MetaspaceSize=96m -XX:MinMetaspaceFreeRatio=50-XX:MaxMetaspaceFreeRatio=80
G1 Tuning -XX:+UseG1GC -XX:MaxGCPauseMillis=20-XX:InitiatingHeapOccupancyPercent=35-XX:G1HeapRegionSize=16M
GC Logging -XX:+PrintGCDetails -XX:+PrintGCTimeStamps-XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps-XX:+PrintTenuringDistribution-Xloggc:/path/to/logs/gc.log -verbose:gc
Error Handling -XX:-HeapDumpOnOutOfMemoryError-XX:ErrorFile=/path/to/logs/hs_err.log
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 39
OS Tuning Parameters
Networking:net.core.rmem_default = 124928net.core.rmem_max = 2048000net.core.wmem_default = 124928net.core.wmem_max = 2048000net.ipv4.tcp_rmem = 4096 87380 4194304net.ipv4.tcp_wmem = 4096 16384 4194304net.ipv4.tcp_max_tw_buckets = 262144net.ipv4.tcp_max_syn_backlog = 1024
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 40
OS Tuning Parameters (cont.)
Virtual Memoryvm.oom_kill_allocating_task = 1vm.max_map_count = 200000vm.swappiness = 1vm.dirty_writeback_centisecs = 500vm.dirty_expire_centisecs = 500vm.dirty_ratio = 60vm.dirty_background_ratio = 5
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 41
Kafka Broker Sensors
kafka.server:name=BytesInPerSec,type=BrokerTopicMetricskafka.server:name=BytesOutPerSec,type=BrokerTopicMetricskafka.server:name=MessagesInPerSec,type=BrokerTopicMetricskafka.server:name=PartitionCount,type=ReplicaManagerkafka.server:name=LeaderCount,type=ReplicaManagerkafka.server:name=UnderReplicatedPartitions,type=ReplicaManagerkafka.server:name=RequestHandlerAvgIdlePercent,type=KafkaRequestHandlerPoolkafka.controller:name=ActiveControllerCount,type=KafkaControllerkafka.controller:name=OfflinePartitionsCount,type=KafkaControllerkafka.log:name=max-dirty-percent,type=LogCleanerManagerkafka.network:name=NetworkProcessorAvgIdlePercent,type=SocketServerkafka.network:name=RequestsPerSec=*,type=RequestMetricskafka.network:name=RequestQueueTimeMs,request=*,type=RequestMetricskafka.network:name=LocalTimeMs,request=*,type=RequestMetricskafka.network:name=RemoteTimeMs,request=*,type=RequestMetricskafka.network:name=ResponseQueueTimeMs,request=*,type=RequestMetricskafka.network:name=ResponseSendTimeMs,request=*,type=RequestMetricskafka.network:name=TotalTimeMs,request=*,type=RequestMetrics
SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 42
Kafka Broker Sensors - Topics
kafka.server:name=BytesInPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=BytesOutPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=TotalProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=FailedProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=TotalFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=FailedFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.log:type=Log,name=LogEndOffset,topic=*,partition=*