Experience with Kafka & Storm

24
Target and Connect Intelligently Experience with Kafka & Storm Otto Mok Solution Architect, AcuityAds April 30, 2014 – Toronto Hadoop User Group

description

Experience with Kafka & Storm by Otto Mok

Transcript of Experience with Kafka & Storm

Page 1: Experience with Kafka & Storm

Target and Connect Intelligently

Experience with Kafka & Storm

Otto MokSolution Architect, AcuityAdsApril 30, 2014 – Toronto Hadoop User Group

Page 2: Experience with Kafka & Storm

2

Agenda

• Background– What does AcuityAds do?

• Use case– What are we trying to do?

• High-level System Architecture– How does the data flow?

• Kafka & Storm– What did we do wrong?

Page 3: Experience with Kafka & Storm

3

Background

Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u

Page 4: Experience with Kafka & Storm

4

Background

• Digital Advertising– Website banner, pre-roll video, free mobile app

• Buy ad impressions at ‘real-time’– Response within 50ms for auction

• Find best match between people and ads– Show ad that you care about

• Use machine learning algo to ‘learn’– Data, data, data

Page 5: Experience with Kafka & Storm

5

Use case

• 10+ billion daily impressions• 30,000+ new sites daily

• How many daily impressions by site?

• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...

Page 6: Experience with Kafka & Storm

6

High-level System Architecture

• 10+ billion daily bid requests

• Make up to 4 billion daily bids

• Serve millions of daily impressions

• 10+ TB of messages daily

• 300k+ message / second

Bidder Adserver

Kafka

Hbase/Hadoop

Storm

Page 7: Experience with Kafka & Storm

7

Kafka

Source: http://kafka.apache.org/documentation.html

Page 8: Experience with Kafka & Storm

8

Kafka - Spec

• Kafka v0.8.0• Servers – 10 x 2U(10 x 3TB) JBOD• Total storage – 300 TB• Replication – 3x• Unique data – 100 TB• Capacity – a few days• Producer acknowledgment – never waits• Topic - BIDREQUEST

Page 9: Experience with Kafka & Storm

9

Kafka - Monitoring

• Nagios– Ping, CPU, memory, network I/O, disk space

• Producer-Consumer group message counting– Hourly consumption rate check

Topic Consumer Group ID Producer Count Consumer Count Error Ratio

BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00

BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88

Page 10: Experience with Kafka & Storm

10

Kafka - Monitoring

• Kafka Web Console– Partition offset for each consumer group

Page 11: Experience with Kafka & Storm

11

Kafka - Issues

• Issue 1 - Partitions– 10 partitions– Each partition > 1 TB a day– 100 TB / 1 TB – no problem!

• Each partition is stored in a directory– /disk05/kafka-logs/BIDREQUEST-09– /disk09/kafka-logs/BIDREQUEST-03

Page 12: Experience with Kafka & Storm

12

Kafka - Issues

• Issue 2 – Unbalanced partition distribution– Some servers running out of space– Some servers are not “leader” for any partition

• Network glitch cause server to drop out of cluster, no longer leader after rejoin

• auto.leader.rebalance.enable=true

Page 13: Experience with Kafka & Storm

13

Lots of data – now what?

Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg

Page 14: Experience with Kafka & Storm

14

Use case - again

• 10+ billion daily impressions• 30,000+ new sites daily

• How many daily impressions by site?

• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...

Page 15: Experience with Kafka & Storm

15

Storm

Source: http://storm.incubator.apache.org/documentation/Tutorial.html

Page 16: Experience with Kafka & Storm

16

Storm - Spec

• Storm v0.8.2• Servers – 13 x Dual Quad Core Xeon 36G RAM• 4 worker slots per server• Total logical CPUs – 208• Total memory – 468 G• Total slots – 52 worker slots (JVMs)

Page 17: Experience with Kafka & Storm

17

Storm - Monitor

Page 18: Experience with Kafka & Storm

18

Storm - Topology

• Spout read each BidRequest from Kafka topic• Determine new or existing, emit tuples to

different “streams”

Page 19: Experience with Kafka & Storm

19

Storm - Topology

• InsertInventoryBolt– Process tuples from NewInventory stream– Field grouping on sourceId, domainName– Tick tuple every 1 second

• UpdateInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 1 second

Page 20: Experience with Kafka & Storm

20

Storm - Topology

• LogInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 10 seconds

Page 21: Experience with Kafka & Storm

21

Storm - Issues

• Issue – Low uptime– 10 workers, 100 executors– Not processing many tuples– Process latency < 10ms

• Bolts restarts due to uncaught Exceptions

Page 22: Experience with Kafka & Storm

22

Conclusion

• Cost– Bleed edge technology bugs– Support mailing lists– Monitoring roll your own– Operation dedicated personnel

• Benefit– Near real-time data on site impression volume &

distribution by geo, demo, etc...

Page 23: Experience with Kafka & Storm

23

Forward Looking

• Kafka v0.8.1.1– Allow specify broker hostname for producer &

consumer– Change # of partitions of a topic online

• Storm v0.9.1– Faster pure Java Netty transport– View logs from each server from Storm UI– Tick tuple using floating point seconds– Storm on Hadoop (HDP 2.1)

Page 24: Experience with Kafka & Storm

24

Thank you

Otto [email protected]: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg