Fluentd introduction at ipros

47
Fluentd Introduction Masahiro Nakagawa Treasuare Data, Inc. Senior Software Engineer at iPROS Thursday, October 31, 13

description

Fluentd presentatino slide at #iprostm http://atnd.org/events/44556

Transcript of Fluentd introduction at ipros

Page 1: Fluentd introduction at ipros

Fluentd Introduction

Masahiro NakagawaTreasuare Data, Inc.

Senior Software Engineer

at iPROS

Thursday, October 31, 13

Page 2: Fluentd introduction at ipros

Who are you?

> Masahiro Nakagawa> @repeatedly / [email protected]

> Treasure Data, Inc> Senior Software Engineer, since 2012/11

> Open Source Projects> D programming Language> MessagePack, Fluentd, etc...

Thursday, October 31, 13

Page 3: Fluentd introduction at ipros

Structured logging

Reliable forwarding

Pluggable architecturehttp://fluentd.org/

Thursday, October 31, 13

Page 4: Fluentd introduction at ipros

Agenda

> Background

> Overview

> Product Comparison

> Use cases

Thursday, October 31, 13

Page 5: Fluentd introduction at ipros

Background

Thursday, October 31, 13

Page 6: Fluentd introduction at ipros

Data Processing

Collect Store Process Visualize

Data source

Reporting Monitoring

Thursday, October 31, 13

Page 7: Fluentd introduction at ipros

Related Products

Store Process

ClouderaHorton WorksTreasure Data

Collect Visualize

TableauExcel

R

easier & shorter time

???

Thursday, October 31, 13

Page 8: Fluentd introduction at ipros

Thursday, October 31, 13

Page 9: Fluentd introduction at ipros

Before Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

FluentLog ServerHigh Latency!must wait for a day...

Thursday, October 31, 13

Page 10: Fluentd introduction at ipros

After Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

In streaming!

Fluentd Fluentd Fluentd

Fluentd Fluentd

Thursday, October 31, 13

Page 11: Fluentd introduction at ipros

Overview

Thursday, October 31, 13

Page 12: Fluentd introduction at ipros

> Open sourced log collector written in Ruby

> Using rubygems ecosystem for plugins

In short

It’s like syslogd, butuses JSON for log messages

Thursday, October 31, 13

Page 13: Fluentd introduction at ipros

tail

insert

eventbuffering

127.0.0.1 - - [30/Oct/2013:07:26:27] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:30] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:32] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:40] "GET / ...127.0.0.1 - - [30/Oct/2013:07:27:01] "GET / ...

...

Fluentd

Web Server

2013-10-30 01:33:51apache.log

{ "host": "127.0.0.1", "method": "GET", ...}

Example (apache to mongo)

Thursday, October 31, 13

Page 14: Fluentd introduction at ipros

> default second unit

> from data source oradding parsed time

Event structure(log message)

✓ Time

> for message routing

✓ Tag

> JSON format

> MessagePackinternally

> non-unstructured

✓ Record

Thursday, October 31, 13

Page 15: Fluentd introduction at ipros

Pluggable Architecture

Buffer Output

Input

> Forward> HTTP> File tail> dstat> ...

> Forward> File> MongoDB> ...

> File> Memory

Engine

Output

> rewrite> ...

Pluggable Pluggable

Thursday, October 31, 13

Page 16: Fluentd introduction at ipros

Fluentd

# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2013-10-30 18:56:01 myapp.login {“user”:38}

> Ruby> Java> Perl> PHP> Python> D> Scala> ...

Application

Time:Tag:Record

Client libraries

Thursday, October 31, 13

Page 17: Fluentd introduction at ipros

Configuration and operation

> No central / master node> HTTP include helps conf sharing

> Operation depends on your environment> Use your deamon management> Use Chef in Treasure Data

> Apache like syntax and Ruby DSL

Thursday, October 31, 13

Page 18: Fluentd introduction at ipros

# receive events via HTTP<source> type http port 8888</source>

# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>

# save access logs to MongoDB<match apache.access> type mongo database apache collection log</match>

# save alerts to a file<match alert.**> type file path /var/log/fluent/alerts</match>

# forward other logs to servers<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>

include http://example.com/conf

Thursday, October 31, 13

Page 19: Fluentd introduction at ipros

Reliability (core + plugin)

> Buffering> Use file buffer for persistent data> buffer chunk has ID for idempotent

> Retrying

> Error handling> transaction, failover, etc on forward plugin> secondary

Thursday, October 31, 13

Page 20: Fluentd introduction at ipros

Plugins - use rubygems

$ fluent-gem search -rd fluent-plugin

$ fluent-gem search -rd fluent-mixin

$ fluent-gem install fluent-plugin-mongo

Thursday, October 31, 13

Page 21: Fluentd introduction at ipros

http://fluentd.org/plugin/Thursday, October 31, 13

Page 22: Fluentd introduction at ipros

in_tail

✓ read a log file✓ custom regexp✓ custom parser in Ruby

FluentdApache

access.log

> apache> apache2> syslog> nginx

> json> csv> tsv> ltsv

Supported format:

Thursday, October 31, 13

Page 23: Fluentd introduction at ipros

Fluentd

out_mongo

Apache

bufferaccess.log

✓ retry automatically✓ exponential retry wait✓ persistent on a file

Thursday, October 31, 13

Page 24: Fluentd introduction at ipros

Fluentd

out_webhdfs

buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

✓ slice files based on time2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...

HDFS

✓ custom text formatter

Apache

access.log

Thursday, October 31, 13

Page 25: Fluentd introduction at ipros

out_copy + other plugins

✓ routing based on tags✓ copy to multiple storages

Amazon S3

HadoopFluentd

buffer

Apache

access.log

Thursday, October 31, 13

Page 26: Fluentd introduction at ipros

out_forward

apache

✓ automatic fail-over✓ load balancing

FluentdApache

bufferaccess.log

✓ retry automatically✓ exponential retry wait✓ persistent on a file

Fluentd

Fluentd

Fluentd

Thursday, October 31, 13

Page 27: Fluentd introduction at ipros

Forward topology

send/ack

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

send/ack

Thursday, October 31, 13

Page 28: Fluentd introduction at ipros

Other plugins

> Filter, Aggregator, Converter> rewrite-tag-filter, sampling-filter, ...> *-counter, *-monitor, ...> record-modifier, flatten, map, typecast, ...

> See @tagomoris’s slide> http://www.slideshare.net/tagomoris/fluentd-

meetupfukuoka201303

Thursday, October 31, 13

Page 29: Fluentd introduction at ipros

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Thursday, October 31, 13

Page 30: Fluentd introduction at ipros

Other status

> Localizing docs into Japanese > https://github.com/fluent/fluentd-docs/tree/

master/docs/ja

> Windows support> Started by JBAT

https://github.com/fluent/fluentd/tree/windows> Feedback and patch are welcome!

Thursday, October 31, 13

Page 31: Fluentd introduction at ipros

v11

> Spec is not fixed yet

> Breaking source code compatibility

> Several improvments> routing label, filter, error stream, etc.> serverengine based: multi-process, signal, etc.

> http://magazine.rubyist.net/?0044-FluentdV11NewFeatures

Thursday, October 31, 13

Page 32: Fluentd introduction at ipros

td-agent

> Open sourced distribution package of Fluentd> ETL part of Treasure Data> deb, rpm, homebrew

> Including useful components> ruby, jemalloc, fluentd> 3rd party gems: td, mongo, webhdfs, etc...

> http://packages.treasure-data.com/

Thursday, October 31, 13

Page 33: Fluentd introduction at ipros

Product Comparison

Thursday, October 31, 13

Page 34: Fluentd introduction at ipros

Flume Flume: distributed log collector by Cloudera

Flume

Hadoop HDFS

Flume Flume

Flume Master Phisical Topology

Logical Topology

Thursday, October 31, 13

Page 35: Fluentd introduction at ipros

Network topology

Agent

Agent

Agent

Agent

Collector

CollectorCollector

Master

Agent

Agent

Agent

Agent

Collector

CollectorCollector

ack

send

send/ack

Flume OG

Flume NG

Master

MasterMaster Option

Thursday, October 31, 13

Page 36: Fluentd introduction at ipros

Pros and Cons

> Pros> Using central master to manage all nodes

> Cons> Java culture (Pros for Java-er?)

Difficult configuration and setup> Difficult topology> Mainly for Hadoop

less plugins?

Thursday, October 31, 13

Page 37: Fluentd introduction at ipros

Logstash

http://logstash.net/

Thursday, October 31, 13

Page 38: Fluentd introduction at ipros

Pros and Cons

> Pros> Bundled 140 plugins (input/filter/codec/output)> Built-in ElasticSearch and Kibana> Works on Windows but unstable...

> Cons> mainly for JRuby> Need external daemon for centralized env

Redis, RabbitMQ or etc

Thursday, October 31, 13

Page 39: Fluentd introduction at ipros

Use cases

Thursday, October 31, 13

Page 40: Fluentd introduction at ipros

Treasure Data

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd

Applications push metrics to Fluentd(via local Fluentd)

Librato Metricsfor realtime analysis

Treasure Data

for historical analysis

Fluentd sums up data minutes(partial aggregation)

Thursday, October 31, 13

Page 41: Fluentd introduction at ipros

hundreds of app servers

sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Cookpad

✓ Over 100 RoR servers (2012/2/4)

Thursday, October 31, 13

Page 42: Fluentd introduction at ipros

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

LINE

by @tagomoris

✓ 16 nodes✓ 120,000+ lines/sec✓ 400Mbps at peak✓ 1.5+ TB/day (raw)

Web Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers

GraphTools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Thursday, October 31, 13

Page 44: Fluentd introduction at ipros

Other companies

Thursday, October 31, 13

Page 45: Fluentd introduction at ipros

> Fluentd is now a widely-used project

> There are many use cases

> Many contributors and plugins

> Keep it simple

> Easy to use and integrate your environment

Conclusion

Thursday, October 31, 13