Fluentd and Kafka
date post
14-Jan-2017Category
Technology
view
4.423download
0
Embed Size (px)
Transcript of Fluentd and Kafka
Fluentd and KafkaHadoop / Spark Conference Japan 2016 Feb 8, 2016
Who are you?
Masahiro Nakagawa github: @repeatedly
Treasure Data Inc. Fluentd / td-agent developer Fluentd Enterprise support
I love OSS :) D Language, MessagePack, The organizer of several meetups, etc
https://www.treasuredata.com/fluentd_enterprise
Fluentd Pluggable streaming event collector
Lightweight, robust and flexible Lots of plugins on rubygems Used by AWS, GCP, MS and more companies
Resources http://www.fluentd.org/ Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
http://www.fluentd.org/pluginshttp://www.fluentd.org/https://www.youtube.com/watch?v=6uPB_M7cbYk
Popular case
App
Push
Push
Forwarder Aggregator Destination
Distributed messaging system Producer - Broker - Consumer pattern Pull model, replication, etc
Apache Kafka
App
PushPull
Producer Broker DestinationConsumer
Push vs Pull
Push: Easy to transfer data to multiple destinations Hard to control stream ratio in multiple streams
Pull: Easy to control stream flow / ratio Should manage consumers correctly
There are 2 ways
fluent-plugin-kafka
kafka-fluentd-consumer
fluent-plugin-kafka Input / Output plugin for kafka
https://github.com/htgc/fluent-plugin-kafka in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
Pros Easy to use and output support
Cons Performance is not primary
https://github.com/htgc/fluent-plugin-kafka
Configuration example
@type kafka topics web,system format json add_prefix kafka. # more options
@type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1
https://github.com/htgc/fluent-plugin-kafka#usage
https://github.com/htgc/fluent-plugin-kafka#usage
kafka fluentd consumer Stand-alone kafka consumer for fluentd
https://github.com/treasure-data/kafka-fluentd-consumer
Send cosumed events to fluentds in_forward
Pros High performance and Java API features
Cons Need Java runtime
https://github.com/treasure-data/kafka-fluentd-consumer
Run consumer
Edit log4j and fluentd-consumer properties
Run following command: $ java \ -Dlog4j.configuration=file:///path/to/log4j.properties \ -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar \ path/to/fluentd-consumer.properties
Properties examplefluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
With Fluentd example
@type forward
@type exec command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-0.2.1-all.jar /path/to/config/fluentd-consumer.properties tag dummy format json
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
Conclusion
Kafka is now becomes important component on data platform
Fluentd can communicate with Kafka Fluentd plugin and kafka consumer
Building reliable and flexible data pipeline with Fluentd and Kafka