Download - Fluentd v0.12 master guide

Masahiro NakagawaJune 1, 2015

Fluentd meetup 2015 Summer

Fluentd - v0.12 master guide -

#fluentdmeetup

Who are you?

> Masahiro Nakagawa> github/twitter: @repeatedly

> Treasure Data, Inc.> Senior Software Engineer> Fluentd / td-agent developer

> I love OSS :)> D language - Phobos committer> Fluentd - Main maintainer> MessagePack / RPC - D and Python (only RPC)> The organizer of Presto Source Code Reading / meetup> etc…

Structured logging

Reliable forwarding

Pluggable architecture

http://fluentd.org/

http://fluentd.org/

What’s Fluentd?

> Data collector for unified logging layer> Streaming data transfer based on JSON> Written in Ruby

> Gem based various plugins> http://www.fluentd.org/plugins

> Working in production> http://www.fluentd.org/testimonials

http://www.fluentd.org/plugins

http://www.fluentd.org/testimonials

v0.10 (old stable)

> Mainly for log forwarding> with good performance> working in production

> with td-agent 1 and td-agent 2.0 / 2.1> Robust but not good for log processing

Architecture (v0.10)

Buffer Output

Input

> Forward > HTTP > File tail > dstat > ...

> Forward > File > MongoDB > ...

> File > Memory

Engine

Output

> rewrite > ...

Pluggable Pluggable

v0.12 (current stable)

> v1 configuration by default> Event handling improvement

> Filter, Label, Error Stream> At-least-once semantics in forwarding

> Add require_ack_response parameter> HTTP RPC based management> Latest release is v0.12.11

Architecture (v0.12 or later)

EngineInput

Filter Output

Buffer

> grep > record_transfomer> …

> Forward> File tail> ...

> Forward> File> ...

Output

> File> Memory

not pluggable

FormatterParser

v1 configuration

> hash, array and enum types are added> hash and array are json

> Embed Ruby code using "#{}",> easy to set variable values: "#{ENV['KEY']}"

> Add :secret option to mask parameters> “@“ prefix for built-in parameters

> @type, @id and @log_level

New v1 formats

> Easy to write complex values> No trick or additional work for common cases

<source> @type my_tail keys ["k1", "k2", "k3"]</source>

<match **> @typo my_filter add_keys {"k1" : "v1"}</match>

<filter **> @type my_filter env "#{ENV['KEY']}"</filter>

Hash, Array, etc: Embedded Ruby code:

• Socket.gethostname

• `command`

• etc...

:secret option

> For masking sensitive parameters> In fluentd logs and in_monitor_agent

2015-05-29 19:50:10 +0900 [info]: using configuration file: <ROOT> <source> @type forward </source> <match http.**> @type test sensitive_param xxxxxx </match><ROOT>

config_param :sensitive_param, :string, :secret => true

> Apply filtering routine to event stream> No more tag tricks and can’t modify tag

<match access.**> type record_reformer tag reformed.${tag}</match>

<match reformed.**> type growthforecast</match>

<filter access.**> @type record_transformer …</filter>

v0.10: v0.12:

<match access.**> @type growthforecast</match>

Filter

Processing pipeline comparison

Output

Engine

Filter

Output

Output

1 transaction 1 transaction

1 transaction

v0.10

v0.12

> Mutate events> http://docs.fluentd.org/articles/

filter_record_transformer

<filter event.**> @type record_transformer <record> hostname "#{Socket.gethostname}" </record></filter>

<match event.**> @type mongodb</match>

Filter: record_transformer

http://docs.fluentd.org/articles/filter_record_transformer

> Grep event streams> http://docs.fluentd.org/articles/filter_grep

<filter event.**> @type grep regexp1 1 message cool regexp2 hostname ^web\d+\.example\.com$ exclude1 message uncool</filter>


Filter: grep

http://docs.fluentd.org/articles/filter_grep

> Print events to stdout> No need copy and stdout plugins combo!> http://docs.fluentd.org/articles/filter_stdout

<filter event.**> @type stdout</filter>


Filter: stdout

http://docs.fluentd.org/articles/filter_stdout

> Override filter method

module Fluent::AddTagFilter < Filter # Same as other plugins, initialize, configure, start, shudown # Define configurations by config_param utilities

def filter(tag, time, record) # Process record record["tag"] = tag

# Return processed record, # If return nil, that records are ignored record endend

Filter: Plugin development 1

> Override filter_stream method

module Fluent::AddTagFilter < Filter def filter_stream(tag, es) new_es = MultiEventStream.new es.each { |time, record| begin record["tag"] = tag new_es.add(time, record) rescue => e router.emit_error_event(tag, time, record, e) end } new_es endend

Filter: Plugin development 2

> Internal event routing> Redirect events to another group> much easier to group and share plugins

<source> @type forward</source>

<match app1.**> @type s3</match>

…

<source> @type forward @label @APP1</source><label @APP1> <match access.**> @type s3 </match></label>

v0.10: v0.12:

Label

> Use router.emit instead of Engine.emit> Engine#emit API is deprecated tag = ""

time = Engine.nowrecord = {…}Engine.emit(tag, time, record)

v0.10: v0.12:

tag = ""time = Engine.nowrecord = {…}router.emit(tag, time, record)

Label : Need to update plugin

> Redirect events to another label

<source> @type forward @label @RAW</source>

Label: relabel output

<label @RAW> <match **> @type copy <store> @type flowcounter </store> <store> @type relabel @label @MAIN </store> </match></label>

<label @MAIN> <match access.**> @type s3 </match></label>

Error stream with Label

> Can handle an error at each record level> router.emit_error_event(tag, time, record, error) ERROR!

{"event":1, ...}

{"event":2, ...}

{"event":3, ...}

chunk1

{"event":4, ...}

{"event":5, ...}

{"event":6, ...}

chunk2

…

Input

OK

ERROR!

OK

OK

OK

Output

<label @ERROR> <match **> type file ... </match></label>

Error stream

Built-in @ERROR is usedwhen error occurred in “emit”

Support at-least-once semantics

> Delivery guarantees in failure scenarios> At-most-once: messages may be lost> At-least-once: messages may be duplicated> Exactly-once: No lost and duplication

> Fluentd supports at-most-once in v0.10> Fluentd supports at-least-once since v.12!

> set require_ack_response parameter

At-most-once and At-least-once

<match app.**> @type forward require_ack_response</match>

may be duplicated

Error!

<match app.**> @type forward</match>

may be lost

Error!

× ×

HTTP RPC based management

> Use HTTP/JSON API instead of signals> For Windows and JRuby support

> RPC is based on HTTP RPC style, not REST> See https://api.slack.com/web#basics> Enabled by rpc_endpoint in <system>

> Have a plan to add more APIs> stop input plugins, check plugins and etc

https://api.slack.com/web#basics

Supported RPCs

> /api/processes.interruptWorkers> /api/processes.killWorkers

> Same as SIGINT and SIGTERM> /api/plugins.flushBuffers

> Same as SIGUSR1> /api/config.reload

> Same as SIGHUP

RPC example

> Configuration

> Curl

<system> rpc_endpoint 127.0.0.1:24444</system>

$ curl http://127.0.0.1:24444/api/plugins.flushBuffers{"ok":true}

Ecosystem

Almost ecosystems are v0.12 based

> Treasure Agent> v2.2 is shipped with v0.12

> docs.fluentd.org are now v0.12> You can see v0.10 document via v0.10 prefix> http://docs.fluentd.org/v0.10/articles/quickstart

> If your used plugins don’t use v0.12 feature,please contribute it!

http://docs.fluentd.org

http://docs.fluentd.org/v0.10/articles/quickstart

Roadmap> v0.10 (old stable)> v0.12 (current stable) <- Now!

> Filter / Label / At-least-once / HTTP RPC> v0.14 (summer, 2015)

> New plugin APIs, ServerEngine, Time…> v1 (fall/winter, 2015)

> Fix new features / APIs

https://github.com/fluent/fluentd/wiki/V1-Roadmap

https://github.com/fluent/fluentd/wiki/V1-Roadmap

https://jobs.lever.co/treasure-data

Cloud service for the entire data pipeline. We’re hiring!

https://jobs.lever.co/treasure-data