Fluentd unified logging layer

Kiyoto TamuraNov 17, 2014

RubyConf 2014

FluentdUnified Logging Layer

whoami

Kiyoto Tamura

GitHub/Twitter: kiyoto/kiyototamura

Treasure Data, Inc.

Director of Developer Relations

Fluentd maintainer

2

a ruby n00b

Fluentd n00b too

why me?

Busy writing code! Just gave a talk!

I’m giving a talk!Busy writing code!

Busy as CTO! San Diego’s nice!

What’s Fluentd?

An extensible & reliable data collection tool

simple core + plugins

buffering, HA (failover), load balance, etc.

like syslogd

data collection tool

Blueflood

MongoDB

Hadoop

Metrics

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Your system

bash scripts ruby scripts

rsync

log file

bash

python scripts

customloggger

✓ duplicated code for error handling... ✓ messy code for retrying mechnism...

cron

other customscripts...

(this is painful!!!)

Blueflood

MongoDB

Hadoop

Metrics

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Your systemfilter / buffer / route

extensible

Core Plugins

12

• Divide & Conquer

• Buffering & Retries

• Error Handling

• Message Routing

• Parallelism

• Read Data

• Parse Data

• Buffer Data

• Write Data

• Format Data

Core Plugins

13

• Divide & Conquer

• Buffering & Retries

• Error Handling

• Message Routing

• Parallelism

• Read Data

• Parse Data

• Buffer Data

• Write Data

• Format Data

CommonConcerns

Use CaseSpecific

reliable

reliable data transfer

Divide & Conquer & Retry

error retry

error retry retry

retry

reliable process

This?

18

Or this?

19

M x N → M + N

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesbuffer/filter/route

use cases

Simple Forwarding

22

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag backend.apache</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match backend.*> type mongo database fluent collection test</match>

Less Simple Forwarding

24

Lambda Architecture

25

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match backend.*> type copy

<store> type elasticsearch logstash_format true </store>

<store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>

CEP for Stream Processing

27

Container Logging

28

Fluentd on Kubernetes

architecture

Internal Architecture

Input Parser Buffer Output Formatter



“input-ish” “output-ish”

Input plugins

HTTP+JSON (in_http) File tail (in_tail) Syslog (in_syslog) ...

✓ Receive logs

✓ Or pull logs from data sources

✓ non-blocking

Input

Input plugins

module Fluent class NewTailInput < Input Plugin.register_input('tail', self)

def initialize super @paths = [] @tails = {} end end # Little more codeend

Input pluginsmodule Fluent class NewTailInput < Input Plugin.register_input('tail', self)

def initialize super @paths = [] @tails = {} end

config_param :path, :string config_param :tag, :string config_param :rotate_wait, :time, :default => 5 config_param :pos_file, :string, :default => nil config_param :read_from_head, :bool, :default => false config_param :refresh_interval, :time, :default => 60

attr_reader :paths

def configure(conf) super

@paths = @path.split(',').map {|path| path.strip } if @paths.empty? raise ConfigError, "tail: 'path' parameter is required on tail input" end

unless @pos_file $log.warn "'pos_file PATH' parameter is not set to a 'tail' source." $log.warn "this parameter is highly recommended to save the position to resume tailing." end

configure_parser(conf) configure_tag

@multiline_mode = conf['format'] == 'multiline' @receive_handler = if @multiline_mode method(:parse_multilines) else method(:parse_singleline) end end

def configure_parser(conf) @parser = TextParser.new @parser.configure(conf) end

def configure_tag if @tag.index('*') @tag_prefix, @tag_suffix = @tag.split('*') @tag_suffix ||= '' else @tag_prefix = nil @tag_suffix = nil end end

def start if @pos_file @pf_file = File.open(@pos_file, File::RDWR|File::CREAT, DEFAULT_FILE_PERMISSION) @pf_file.sync = true @pf = PositionFile.parse(@pf_file) end

@loop = Coolio::Loop.new refresh_watchers

@refresh_trigger = TailWatcher::TimerWatcher.new(@refresh_interval, true, log, &method(:refresh_watchers)) @refresh_trigger.attach(@loop) @thread = Thread.new(&method(:run)) end

def shutdown @refresh_trigger.detach if @refresh_trigger && @refresh_trigger.attached?

stop_watchers(@tails.keys, true) @loop.stop rescue nil # when all watchers are detached, `stop` raises RuntimeError. We can ignore this exception. @thread.join @pf_file.close if @pf_file end

def expand_paths date = Time.now paths = [] @paths.each { |path| path = date.strftime(path) if path.include?('*') paths += Dir.glob(path) else # When file is not created yet, Dir.glob returns an empty array. So just add when path is static. paths << path end } paths end

# in_tail with '*' path doesn't check rotation file equality at refresh phase. # So you should not use '*' path when your logs will be rotated by another tool. # It will cause log duplication after updated watch files. # In such case, you should separate log directory and specify two paths in path parameter. # e.g. path /path/to/dir/*,/path/to/rotated_logs/target_file def refresh_watchers target_paths = expand_paths existence_paths = @tails.keys

unwatched = existence_paths - target_paths added = target_paths - existence_paths

700 lines!

Input pluginsmodule Fluent class TcpInput < SocketUtil::BaseInput Plugin.register_input('tcp', self)

config_set_default :port, 5170 config_param :delimiter, :string, :default => "\n" # syslog family add "\n" to each message and this seems only way to split messages in tcp stream

def listen(callback) log.debug "listening tcp socket on #{@bind}:#{@port}" Coolio::TCPServer.new(@bind, @port, SocketUtil::TcpHandler, log, @delimiter, callback) end endend

Input pluginsclass BaseInput < Fluent::Input # some code def on_message(msg, addr) @parser.parse(msg) { |time, record| unless time && record log.warn "pattern not match: #{msg.inspect}" return end

record[@source_host_key] = addr[3] if @source_host_key Engine.emit(@tag, time, record) } # some codeend

Parser plugins

JSON Regexp Apache/Nginx/Syslog CSV/TSV, etc.

✓ Parse into JSON

✓ Common formats out of the box

✓ v0.10.46 and above

Parser

Parser plugins

<source> type tcp tag tcp.data format /^(?<field_1>\d+) (?<field_2>\w+)/</source>

Parser pluginsdef call(text) m = @regexp.match(text) # some code time = nil record = {}

m.names.each {|name| if value = m[name] case name when "time" time = @mutex.synchronize { @time_parser.parse(value) } else record[name] = if @type_converters.nil? value else convert_type(name, value) end end end } # some codeend

Buffer plugins

✓ Improve performance

✓ Provide reliability

✓ Provide thread-safetyMemory (buf_memory) File (buf_file)

Buffer

Buffer plugins

✓ Chunk = adjustable unit of data

✓ Buffer = Queue of chunks

chunk

chunk

chunk output

Input

Output plugins

✓ Write to external systems

✓ Buffered & Non-buffered

✓ 200+ plugins

Output

File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...

Output pluginsclass FileOutput < TimeSlicedOutput Plugin.register_output('file', self) # some code def write(chunk) path = generate_path(chunk) FileUtils.mkdir_p File.dirname(path)

case @compress when nil File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| chunk.write_to(f) } when :gz File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| gz = Zlib::GzipWriter.new(f) chunk.write_to(gz) gz.close } end

return path # for test end # more code

Formatter plugins

✓ Format output

✓ Only partially supported for now

✓ v0.10.49 and aboveJSON CSV/TSV “single value”

Formatter

Formatter plugins

class SingleValueFormatter include Configurable

config_param :message_key, :string, :default => 'message' config_param :add_newline, :bool, :default => true

def format(tag, time, record) text = record[@message_key].to_s text << "\n" if @add_newline text endend

Adding Filter in v0.12!

Input Parser Buffer Output FormatterFilter

Roadmap

50

Nov Dec Jan Feb Mar Apr May

2014 2015

v0.12 • filter • label

v0.14 • plugin API • ServerEngine

V1.0!? • we can

use help!

goodies

fluentd-ui

52

Treasure Agent

• Treasure Data distribution of Fluentd

• including Ruby, core libraries and QA’ed 3rd party plugins

• rpm/deb/dmg

• 2.1.2 is released TODAY with fluentd-ui

53

fluentd-forwarder

• Forwarding agent written in Go

• mainly for Windows support

• less mature than Fluentd

• Bundle TCP input/output and TD output

• No plugin mechanism

54

Thank you!

[email protected]@kiyototamura

mailto:[email protected]

Fluentd unified logging layer

Software

Transcript of Fluentd unified logging layer