Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith Bourgoin PyData SV 2014
Experience with Kafka & Storm
-
Upload
otto-mok -
Category
Technology
-
view
111 -
download
0
description
Transcript of Experience with Kafka & Storm
Target and Connect Intelligently
Experience with Kafka & Storm
Otto MokSolution Architect, AcuityAdsApril 30, 2014 – Toronto Hadoop User Group
2
Agenda
• Background– What does AcuityAds do?
• Use case– What are we trying to do?
• High-level System Architecture– How does the data flow?
• Kafka & Storm– What did we do wrong?
3
Background
Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u
4
Background
• Digital Advertising– Website banner, pre-roll video, free mobile app
• Buy ad impressions at ‘real-time’– Response within 50ms for auction
• Find best match between people and ads– Show ad that you care about
• Use machine learning algo to ‘learn’– Data, data, data
5
Use case
• 10+ billion daily impressions• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...
6
High-level System Architecture
• 10+ billion daily bid requests
• Make up to 4 billion daily bids
• Serve millions of daily impressions
• 10+ TB of messages daily
• 300k+ message / second
Bidder Adserver
Kafka
Hbase/Hadoop
Storm
7
Kafka
Source: http://kafka.apache.org/documentation.html
8
Kafka - Spec
• Kafka v0.8.0• Servers – 10 x 2U(10 x 3TB) JBOD• Total storage – 300 TB• Replication – 3x• Unique data – 100 TB• Capacity – a few days• Producer acknowledgment – never waits• Topic - BIDREQUEST
9
Kafka - Monitoring
• Nagios– Ping, CPU, memory, network I/O, disk space
• Producer-Consumer group message counting– Hourly consumption rate check
Topic Consumer Group ID Producer Count Consumer Count Error Ratio
BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00
BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
10
Kafka - Monitoring
• Kafka Web Console– Partition offset for each consumer group
11
Kafka - Issues
• Issue 1 - Partitions– 10 partitions– Each partition > 1 TB a day– 100 TB / 1 TB – no problem!
• Each partition is stored in a directory– /disk05/kafka-logs/BIDREQUEST-09– /disk09/kafka-logs/BIDREQUEST-03
12
Kafka - Issues
• Issue 2 – Unbalanced partition distribution– Some servers running out of space– Some servers are not “leader” for any partition
• Network glitch cause server to drop out of cluster, no longer leader after rejoin
• auto.leader.rebalance.enable=true
13
Lots of data – now what?
Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
14
Use case - again
• 10+ billion daily impressions• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...
15
Storm
Source: http://storm.incubator.apache.org/documentation/Tutorial.html
16
Storm - Spec
• Storm v0.8.2• Servers – 13 x Dual Quad Core Xeon 36G RAM• 4 worker slots per server• Total logical CPUs – 208• Total memory – 468 G• Total slots – 52 worker slots (JVMs)
17
Storm - Monitor
18
Storm - Topology
• Spout read each BidRequest from Kafka topic• Determine new or existing, emit tuples to
different “streams”
19
Storm - Topology
• InsertInventoryBolt– Process tuples from NewInventory stream– Field grouping on sourceId, domainName– Tick tuple every 1 second
• UpdateInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 1 second
20
Storm - Topology
• LogInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 10 seconds
21
Storm - Issues
• Issue – Low uptime– 10 workers, 100 executors– Not processing many tuples– Process latency < 10ms
• Bolts restarts due to uncaught Exceptions
22
Conclusion
• Cost– Bleed edge technology bugs– Support mailing lists– Monitoring roll your own– Operation dedicated personnel
• Benefit– Near real-time data on site impression volume &
distribution by geo, demo, etc...
23
Forward Looking
• Kafka v0.8.1.1– Allow specify broker hostname for producer &
consumer– Change # of partitions of a topic online
• Storm v0.9.1– Faster pure Java Netty transport– View logs from each server from Storm UI– Tick tuple using floating point seconds– Storm on Hadoop (HDP 2.1)
24
Thank you
Otto [email protected]: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg