Fluentd unified logging layer

55
Kiyoto Tamura Nov 17, 2014 RubyConf 2014 Fluentd Unied Logging Layer

description

RubyConf 2014: Building the Unified Logging Layer with Fluentd and Ruby

Transcript of Fluentd unified logging layer

Page 1: Fluentd   unified logging layer

Kiyoto TamuraNov 17, 2014

RubyConf 2014

FluentdUnified Logging Layer

Page 2: Fluentd   unified logging layer

whoami

Kiyoto Tamura

GitHub/Twitter: kiyoto/kiyototamura

Treasure Data, Inc.

Director of Developer Relations

Fluentd maintainer

2

Page 3: Fluentd   unified logging layer

a ruby n00b

Page 4: Fluentd   unified logging layer

Fluentd n00b too

Page 5: Fluentd   unified logging layer

why me?

Busy writing code! Just gave a talk!

I’m giving a talk!Busy writing code!

Busy as CTO! San Diego’s nice!

Page 6: Fluentd   unified logging layer

What’s Fluentd?

An extensible & reliable data collection tool

simple core + plugins

buffering, HA (failover), load balance, etc.

like syslogd

Page 7: Fluentd   unified logging layer

data collection tool

Page 8: Fluentd   unified logging layer

Blueflood

MongoDB

Hadoop

Metrics

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Your system

bash scripts ruby scripts

rsync

log file

bash

python scripts

customloggger

✓ duplicated code for error handling... ✓ messy code for retrying mechnism...

cron

other customscripts...

Page 9: Fluentd   unified logging layer

(this is painful!!!)

Page 10: Fluentd   unified logging layer

Blueflood

MongoDB

Hadoop

Metrics

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Your systemfilter / buffer / route

Page 11: Fluentd   unified logging layer

extensible

Page 12: Fluentd   unified logging layer

Core Plugins

12

• Divide & Conquer

• Buffering & Retries

• Error Handling

• Message Routing

• Parallelism

• Read Data

• Parse Data

• Buffer Data

• Write Data

• Format Data

Page 13: Fluentd   unified logging layer

Core Plugins

13

• Divide & Conquer

• Buffering & Retries

• Error Handling

• Message Routing

• Parallelism

• Read Data

• Parse Data

• Buffer Data

• Write Data

• Format Data

CommonConcerns

Use CaseSpecific

Page 14: Fluentd   unified logging layer

reliable

Page 15: Fluentd   unified logging layer

reliable data transfer

Page 16: Fluentd   unified logging layer

Divide & Conquer & Retry

error retry

error retry retry

retry

Page 17: Fluentd   unified logging layer

reliable process

Page 18: Fluentd   unified logging layer

This?

18

Page 19: Fluentd   unified logging layer

Or this?

19

Page 20: Fluentd   unified logging layer

M x N → M + N

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesbuffer/filter/route

Page 21: Fluentd   unified logging layer

use cases

Page 22: Fluentd   unified logging layer

Simple Forwarding

22

Page 23: Fluentd   unified logging layer

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag backend.apache</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match backend.*> type mongo database fluent collection test</match>

Page 24: Fluentd   unified logging layer

Less Simple Forwarding

24

Page 25: Fluentd   unified logging layer

Lambda Architecture

25

Page 26: Fluentd   unified logging layer

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match backend.*> type copy

<store> type elasticsearch logstash_format true </store>

<store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>

Page 27: Fluentd   unified logging layer

CEP for Stream Processing

27

Page 28: Fluentd   unified logging layer

Container Logging

28

Page 29: Fluentd   unified logging layer

Fluentd on Kubernetes

Page 30: Fluentd   unified logging layer

architecture

Page 31: Fluentd   unified logging layer

Internal Architecture

Input Parser Buffer Output Formatter

Page 32: Fluentd   unified logging layer

Internal Architecture

Input Parser Buffer Output Formatter

“input-ish” “output-ish”

Page 33: Fluentd   unified logging layer

Input plugins

HTTP+JSON (in_http) File tail (in_tail) Syslog (in_syslog) ...

✓ Receive logs

✓ Or pull logs from data sources

✓ non-blocking

Input

Page 34: Fluentd   unified logging layer

Input plugins

module Fluent class NewTailInput < Input Plugin.register_input('tail', self)

def initialize super @paths = [] @tails = {} end end # Little more codeend

Page 35: Fluentd   unified logging layer

Input pluginsmodule Fluent class NewTailInput < Input Plugin.register_input('tail', self)

def initialize super @paths = [] @tails = {} end

config_param :path, :string config_param :tag, :string config_param :rotate_wait, :time, :default => 5 config_param :pos_file, :string, :default => nil config_param :read_from_head, :bool, :default => false config_param :refresh_interval, :time, :default => 60

attr_reader :paths

def configure(conf) super

@paths = @path.split(',').map {|path| path.strip } if @paths.empty? raise ConfigError, "tail: 'path' parameter is required on tail input" end

unless @pos_file $log.warn "'pos_file PATH' parameter is not set to a 'tail' source." $log.warn "this parameter is highly recommended to save the position to resume tailing." end

configure_parser(conf) configure_tag

@multiline_mode = conf['format'] == 'multiline' @receive_handler = if @multiline_mode method(:parse_multilines) else method(:parse_singleline) end end

def configure_parser(conf) @parser = TextParser.new @parser.configure(conf) end

def configure_tag if @tag.index('*') @tag_prefix, @tag_suffix = @tag.split('*') @tag_suffix ||= '' else @tag_prefix = nil @tag_suffix = nil end end

def start if @pos_file @pf_file = File.open(@pos_file, File::RDWR|File::CREAT, DEFAULT_FILE_PERMISSION) @pf_file.sync = true @pf = PositionFile.parse(@pf_file) end

@loop = Coolio::Loop.new refresh_watchers

@refresh_trigger = TailWatcher::TimerWatcher.new(@refresh_interval, true, log, &method(:refresh_watchers)) @refresh_trigger.attach(@loop) @thread = Thread.new(&method(:run)) end

def shutdown @refresh_trigger.detach if @refresh_trigger && @refresh_trigger.attached?

stop_watchers(@tails.keys, true) @loop.stop rescue nil # when all watchers are detached, `stop` raises RuntimeError. We can ignore this exception. @thread.join @pf_file.close if @pf_file end

def expand_paths date = Time.now paths = [] @paths.each { |path| path = date.strftime(path) if path.include?('*') paths += Dir.glob(path) else # When file is not created yet, Dir.glob returns an empty array. So just add when path is static. paths << path end } paths end

# in_tail with '*' path doesn't check rotation file equality at refresh phase. # So you should not use '*' path when your logs will be rotated by another tool. # It will cause log duplication after updated watch files. # In such case, you should separate log directory and specify two paths in path parameter. # e.g. path /path/to/dir/*,/path/to/rotated_logs/target_file def refresh_watchers target_paths = expand_paths existence_paths = @tails.keys

unwatched = existence_paths - target_paths added = target_paths - existence_paths

700 lines!

Page 36: Fluentd   unified logging layer

Input pluginsmodule Fluent class TcpInput < SocketUtil::BaseInput Plugin.register_input('tcp', self)

config_set_default :port, 5170 config_param :delimiter, :string, :default => "\n" # syslog family add "\n" to each message and this seems only way to split messages in tcp stream

def listen(callback) log.debug "listening tcp socket on #{@bind}:#{@port}" Coolio::TCPServer.new(@bind, @port, SocketUtil::TcpHandler, log, @delimiter, callback) end endend

Page 37: Fluentd   unified logging layer

Input pluginsclass BaseInput < Fluent::Input # some code def on_message(msg, addr) @parser.parse(msg) { |time, record| unless time && record log.warn "pattern not match: #{msg.inspect}" return end

record[@source_host_key] = addr[3] if @source_host_key Engine.emit(@tag, time, record) } # some codeend

Page 38: Fluentd   unified logging layer

Input pluginsclass BaseInput < Fluent::Input # some code def on_message(msg, addr) @parser.parse(msg) { |time, record| unless time && record log.warn "pattern not match: #{msg.inspect}" return end

record[@source_host_key] = addr[3] if @source_host_key Engine.emit(@tag, time, record) } # some codeend

Page 39: Fluentd   unified logging layer

Parser plugins

JSON Regexp Apache/Nginx/Syslog CSV/TSV, etc.

✓ Parse into JSON

✓ Common formats out of the box

✓ v0.10.46 and above

Parser

Page 40: Fluentd   unified logging layer

Parser plugins

<source> type tcp tag tcp.data format /^(?<field_1>\d+) (?<field_2>\w+)/</source>

Page 41: Fluentd   unified logging layer

Parser pluginsdef call(text) m = @regexp.match(text) # some code time = nil record = {}

m.names.each {|name| if value = m[name] case name when "time" time = @mutex.synchronize { @time_parser.parse(value) } else record[name] = if @type_converters.nil? value else convert_type(name, value) end end end } # some codeend

Page 42: Fluentd   unified logging layer

Buffer plugins

✓ Improve performance

✓ Provide reliability

✓ Provide thread-safetyMemory (buf_memory) File (buf_file)

Buffer

Page 43: Fluentd   unified logging layer

Buffer plugins

✓ Chunk = adjustable unit of data

✓ Buffer = Queue of chunks

chunk

chunk

chunk output

Input

Page 44: Fluentd   unified logging layer

Output plugins

✓ Write to external systems

✓ Buffered & Non-buffered

✓ 200+ plugins

Output

File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...

Page 45: Fluentd   unified logging layer

Output pluginsclass FileOutput < TimeSlicedOutput Plugin.register_output('file', self) # some code def write(chunk) path = generate_path(chunk) FileUtils.mkdir_p File.dirname(path)

case @compress when nil File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| chunk.write_to(f) } when :gz File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| gz = Zlib::GzipWriter.new(f) chunk.write_to(gz) gz.close } end

return path # for test end # more code

Page 46: Fluentd   unified logging layer

Formatter plugins

✓ Format output

✓ Only partially supported for now

✓ v0.10.49 and aboveJSON CSV/TSV “single value”

Formatter

Page 47: Fluentd   unified logging layer

Formatter plugins

class SingleValueFormatter include Configurable

config_param :message_key, :string, :default => 'message' config_param :add_newline, :bool, :default => true

def format(tag, time, record) text = record[@message_key].to_s text << "\n" if @add_newline text endend

Page 48: Fluentd   unified logging layer

Internal Architecture

Input Parser Buffer Output Formatter

Page 49: Fluentd   unified logging layer

Adding Filter in v0.12!

Input Parser Buffer Output FormatterFilter

Page 50: Fluentd   unified logging layer

Roadmap

50

Nov Dec Jan Feb Mar Apr May

2014 2015

v0.12 • filter • label

v0.14 • plugin API • ServerEngine

V1.0!? • we can

use help!

Page 51: Fluentd   unified logging layer

goodies

Page 52: Fluentd   unified logging layer

fluentd-ui

52

Page 53: Fluentd   unified logging layer

Treasure Agent

• Treasure Data distribution of Fluentd

• including Ruby, core libraries and QA’ed 3rd party plugins

• rpm/deb/dmg

• 2.1.2 is released TODAY with fluentd-ui

53

Page 54: Fluentd   unified logging layer

fluentd-forwarder

• Forwarding agent written in Go

• mainly for Windows support

• less mature than Fluentd

• Bundle TCP input/output and TD output

• No plugin mechanism

54

Page 55: Fluentd   unified logging layer

Thank you!

[email protected]@kiyototamura