Masahiro NakagawaJune 1, 2015
Fluentd meetup 2015 Summer
Fluentd - v0.12 master guide -
#fluentdmeetup
Who are you?
> Masahiro Nakagawa> github/twitter: @repeatedly
> Treasure Data, Inc.> Senior Software Engineer> Fluentd / td-agent developer
> I love OSS :)> D language - Phobos committer> Fluentd - Main maintainer> MessagePack / RPC - D and Python (only RPC)> The organizer of Presto Source Code Reading / meetup> etc…
Structured logging
Reliable forwarding
Pluggable architecture
http://fluentd.org/
What’s Fluentd?
> Data collector for unified logging layer> Streaming data transfer based on JSON> Written in Ruby
> Gem based various plugins> http://www.fluentd.org/plugins
> Working in production> http://www.fluentd.org/testimonials
v0.10 (old stable)
> Mainly for log forwarding> with good performance> working in production
> with td-agent 1 and td-agent 2.0 / 2.1> Robust but not good for log processing
Architecture (v0.10)
Buffer Output
Input
> Forward > HTTP > File tail > dstat > ...
> Forward > File > MongoDB > ...
> File > Memory
Engine
Output
> rewrite > ...
Pluggable Pluggable
v0.12 (current stable)
> v1 configuration by default> Event handling improvement
> Filter, Label, Error Stream> At-least-once semantics in forwarding
> Add require_ack_response parameter> HTTP RPC based management> Latest release is v0.12.11
Architecture (v0.12 or later)
EngineInput
Filter Output
Buffer
> grep > record_transfomer> …
> Forward> File tail> ...
> Forward> File> ...
Output
> File> Memory
not pluggable
FormatterParser
v1 configuration
> hash, array and enum types are added> hash and array are json
> Embed Ruby code using "#{}",> easy to set variable values: "#{ENV['KEY']}"
> Add :secret option to mask parameters> “@“ prefix for built-in parameters
> @type, @id and @log_level
New v1 formats
> Easy to write complex values> No trick or additional work for common cases
<source> @type my_tail keys ["k1", "k2", "k3"]</source>
<match **> @typo my_filter add_keys {"k1" : "v1"}</match>
<filter **> @type my_filter env "#{ENV['KEY']}"</filter>
Hash, Array, etc: Embedded Ruby code:
• Socket.gethostname
• `command`
• etc...
:secret option
> For masking sensitive parameters> In fluentd logs and in_monitor_agent
2015-05-29 19:50:10 +0900 [info]: using configuration file: <ROOT> <source> @type forward </source> <match http.**> @type test sensitive_param xxxxxx </match><ROOT>
config_param :sensitive_param, :string, :secret => true
> Apply filtering routine to event stream> No more tag tricks and can’t modify tag
<match access.**> type record_reformer tag reformed.${tag}</match>
<match reformed.**> type growthforecast</match>
<filter access.**> @type record_transformer …</filter>
v0.10: v0.12:
<match access.**> @type growthforecast</match>
Filter
Processing pipeline comparison
Output
Engine
Filter
Output
Output
1 transaction 1 transaction
1 transaction
v0.10
v0.12
> Mutate events> http://docs.fluentd.org/articles/
filter_record_transformer
<filter event.**> @type record_transformer <record> hostname "#{Socket.gethostname}" </record></filter>
<match event.**> @type mongodb</match>
Filter: record_transformer
> Grep event streams> http://docs.fluentd.org/articles/filter_grep
<filter event.**> @type grep regexp1 1 message cool regexp2 hostname ^web\d+\.example\.com$ exclude1 message uncool</filter>
<match event.**> @type mongodb</match>
Filter: grep
> Print events to stdout> No need copy and stdout plugins combo!> http://docs.fluentd.org/articles/filter_stdout
<filter event.**> @type stdout</filter>
<match event.**> @type mongodb</match>
Filter: stdout
> Override filter method
module Fluent::AddTagFilter < Filter # Same as other plugins, initialize, configure, start, shudown # Define configurations by config_param utilities
def filter(tag, time, record) # Process record record["tag"] = tag
# Return processed record, # If return nil, that records are ignored record endend
Filter: Plugin development 1
> Override filter_stream method
module Fluent::AddTagFilter < Filter def filter_stream(tag, es) new_es = MultiEventStream.new es.each { |time, record| begin record["tag"] = tag new_es.add(time, record) rescue => e router.emit_error_event(tag, time, record, e) end } new_es endend
Filter: Plugin development 2
> Internal event routing> Redirect events to another group> much easier to group and share plugins
<source> @type forward</source>
<match app1.**> @type s3</match>
…
<source> @type forward @label @APP1</source><label @APP1> <match access.**> @type s3 </match></label>
v0.10: v0.12:
Label
> Use router.emit instead of Engine.emit> Engine#emit API is deprecated tag = ""
time = Engine.nowrecord = {…}Engine.emit(tag, time, record)
v0.10: v0.12:
tag = ""time = Engine.nowrecord = {…}router.emit(tag, time, record)
Label : Need to update plugin
> Redirect events to another label
<source> @type forward @label @RAW</source>
Label: relabel output
<label @RAW> <match **> @type copy <store> @type flowcounter </store> <store> @type relabel @label @MAIN </store> </match></label>
<label @MAIN> <match access.**> @type s3 </match></label>
Error stream with Label
> Can handle an error at each record level> router.emit_error_event(tag, time, record, error) ERROR!
{"event":1, ...}
{"event":2, ...}
{"event":3, ...}
chunk1
{"event":4, ...}
{"event":5, ...}
{"event":6, ...}
chunk2
…
Input
OK
ERROR!
OK
OK
OK
Output
<label @ERROR> <match **> type file ... </match></label>
Error stream
Built-in @ERROR is usedwhen error occurred in “emit”
Support at-least-once semantics
> Delivery guarantees in failure scenarios> At-most-once: messages may be lost> At-least-once: messages may be duplicated> Exactly-once: No lost and duplication
> Fluentd supports at-most-once in v0.10> Fluentd supports at-least-once since v.12!
> set require_ack_response parameter
At-most-once and At-least-once
<match app.**> @type forward require_ack_response</match>
may be duplicated
Error!
<match app.**> @type forward</match>
may be lost
Error!
× ×
HTTP RPC based management
> Use HTTP/JSON API instead of signals> For Windows and JRuby support
> RPC is based on HTTP RPC style, not REST> See https://api.slack.com/web#basics> Enabled by rpc_endpoint in <system>
> Have a plan to add more APIs> stop input plugins, check plugins and etc
Supported RPCs
> /api/processes.interruptWorkers> /api/processes.killWorkers
> Same as SIGINT and SIGTERM> /api/plugins.flushBuffers
> Same as SIGUSR1> /api/config.reload
> Same as SIGHUP
RPC example
> Configuration
> Curl
<system> rpc_endpoint 127.0.0.1:24444</system>
$ curl http://127.0.0.1:24444/api/plugins.flushBuffers{"ok":true}
Ecosystem
Almost ecosystems are v0.12 based
> Treasure Agent> v2.2 is shipped with v0.12
> docs.fluentd.org are now v0.12> You can see v0.10 document via v0.10 prefix> http://docs.fluentd.org/v0.10/articles/quickstart
> If your used plugins don’t use v0.12 feature,please contribute it!
Roadmap> v0.10 (old stable)> v0.12 (current stable) <- Now!
> Filter / Label / At-least-once / HTTP RPC> v0.14 (summer, 2015)
> New plugin APIs, ServerEngine, Time…> v1 (fall/winter, 2015)
> Fix new features / APIs
https://github.com/fluent/fluentd/wiki/V1-Roadmap
https://jobs.lever.co/treasure-data
Cloud service for the entire data pipeline. We’re hiring!