Fluentd and Kafka

14
Fluentd and Kafka Hadoop / Spark Conference Japan 2016 Feb 8, 2016

Transcript of Fluentd and Kafka

Page 1: Fluentd and Kafka

Fluentd and KafkaHadoop / Spark Conference Japan 2016 Feb 8, 2016

Page 2: Fluentd and Kafka

Who are you?

• Masahiro Nakagawa • github: @repeatedly

• Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support

• I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…

Page 3: Fluentd and Kafka

Fluentd• Pluggable streaming event collector

• Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies

• Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk

Page 4: Fluentd and Kafka

Popular case

App

Push

Push

Forwarder Aggregator Destination

Page 5: Fluentd and Kafka

• Distributed messaging system • Producer - Broker - Consumer pattern • Pull model, replication, etc

Apache Kafka

App

PushPull

Producer Broker DestinationConsumer

Page 6: Fluentd and Kafka

Push vs Pull

• Push: • Easy to transfer data to multiple destinations • Hard to control stream ratio in multiple streams

• Pull: • Easy to control stream flow / ratio • Should manage consumers correctly

Page 7: Fluentd and Kafka

There are 2 ways

• fluent-plugin-kafka

• kafka-fluentd-consumer

Page 8: Fluentd and Kafka

fluent-plugin-kafka• Input / Output plugin for kafka

• https://github.com/htgc/fluent-plugin-kafka • in_kafka, in_kafka_group, out_kafka, out_kafka_buffered

• Pros • Easy to use and output support

• Cons • Performance is not primary

Page 9: Fluentd and Kafka

Configuration example

<source> @type kafka topics web,system format json add_prefix kafka. # more options </source>

<match kafka.**> @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1 </match>

https://github.com/htgc/fluent-plugin-kafka#usage

Page 10: Fluentd and Kafka

kafka fluentd consumer• Stand-alone kafka consumer for fluentd

• https://github.com/treasure-data/kafka-fluentd-consumer

• Send cosumed events to fluentd’s in_forward

• Pros • High performance and Java API features

• Cons • Need Java runtime

Page 11: Fluentd and Kafka

Run consumer

• Edit log4j and fluentd-consumer properties

• Run following command: $ java \ -Dlog4j.configuration=file:///path/to/log4j.properties \ -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar \ path/to/fluentd-consumer.properties

Page 12: Fluentd and Kafka

Properties examplefluentd.tag.prefix=kafka.event.

fluentd.record.format=regexp # default is json

fluentd.record.pattern=(?<text>.*) # for regexp format

fluentd.consumer.topics=app.* # can use Java Rege

fluentd.consumer.topics.pattern=blacklist # default is whitelist

fluentd.consumer.threads=5

https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties

Page 13: Fluentd and Kafka

With Fluentd example

<source> @type forward </source>

<source> @type exec command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-0.2.1-all.jar /path/to/config/fluentd-consumer.properties tag dummy format json </source>

https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec

Page 14: Fluentd and Kafka

Conclusion

• Kafka is now becomes important component on data platform

• Fluentd can communicate with Kafka • Fluentd plugin and kafka consumer

• Building reliable and flexible data pipeline with Fluentd and Kafka