Fluentd and Kafka

Click here to load reader

  • date post

    14-Jan-2017
  • Category

    Technology

  • view

    4.423
  • download

    0

Embed Size (px)

Transcript of Fluentd and Kafka

  • Fluentd and KafkaHadoop / Spark Conference Japan 2016 Feb 8, 2016

  • Who are you?

    Masahiro Nakagawa github: @repeatedly

    Treasure Data Inc. Fluentd / td-agent developer Fluentd Enterprise support

    I love OSS :) D Language, MessagePack, The organizer of several meetups, etc

    https://www.treasuredata.com/fluentd_enterprise

  • Fluentd Pluggable streaming event collector

    Lightweight, robust and flexible Lots of plugins on rubygems Used by AWS, GCP, MS and more companies

    Resources http://www.fluentd.org/ Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk

    http://www.fluentd.org/pluginshttp://www.fluentd.org/https://www.youtube.com/watch?v=6uPB_M7cbYk

  • Popular case

    App

    Push

    Push

    Forwarder Aggregator Destination

  • Distributed messaging system Producer - Broker - Consumer pattern Pull model, replication, etc

    Apache Kafka

    App

    PushPull

    Producer Broker DestinationConsumer

  • Push vs Pull

    Push: Easy to transfer data to multiple destinations Hard to control stream ratio in multiple streams

    Pull: Easy to control stream flow / ratio Should manage consumers correctly

  • There are 2 ways

    fluent-plugin-kafka

    kafka-fluentd-consumer

  • fluent-plugin-kafka Input / Output plugin for kafka

    https://github.com/htgc/fluent-plugin-kafka in_kafka, in_kafka_group, out_kafka, out_kafka_buffered

    Pros Easy to use and output support

    Cons Performance is not primary

    https://github.com/htgc/fluent-plugin-kafka

  • Configuration example

    @type kafka topics web,system format json add_prefix kafka. # more options

    @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1

    https://github.com/htgc/fluent-plugin-kafka#usage

    https://github.com/htgc/fluent-plugin-kafka#usage

  • kafka fluentd consumer Stand-alone kafka consumer for fluentd

    https://github.com/treasure-data/kafka-fluentd-consumer

    Send cosumed events to fluentds in_forward

    Pros High performance and Java API features

    Cons Need Java runtime

    https://github.com/treasure-data/kafka-fluentd-consumer

  • Run consumer

    Edit log4j and fluentd-consumer properties

    Run following command: $ java \ -Dlog4j.configuration=file:///path/to/log4j.properties \ -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar \ path/to/fluentd-consumer.properties

  • Properties examplefluentd.tag.prefix=kafka.event.

    fluentd.record.format=regexp # default is json

    fluentd.record.pattern=(?.*) # for regexp format

    fluentd.consumer.topics=app.* # can use Java Rege

    fluentd.consumer.topics.pattern=blacklist # default is whitelist

    fluentd.consumer.threads=5

    https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties

    https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties

  • With Fluentd example

    @type forward

    @type exec command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-0.2.1-all.jar /path/to/config/fluentd-consumer.properties tag dummy format json

    https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec

    https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec

  • Conclusion

    Kafka is now becomes important component on data platform

    Fluentd can communicate with Kafka Fluentd plugin and kafka consumer

    Building reliable and flexible data pipeline with Fluentd and Kafka