Download - The basics of fluentd

Transcript
Page 1: The basics of fluentd

Structured logging !

Reliable forwarding !

Pluggable architecturehttp://fluentd.org/

Page 2: The basics of fluentd

Agenda

> Background

> Overview

> Product Comparison

> Use cases

Page 3: The basics of fluentd

Background

Page 4: The basics of fluentd

Data Processing

Collect Store Process Visualize

Data source

Reporting Monitoring

Page 5: The basics of fluentd

Related Products

Store Process

ClouderaHorton WorksTreasure Data

Collect Visualize

TableauExcel

R

easier & shorter time

???

Page 6: The basics of fluentd
Page 7: The basics of fluentd

Before Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

FluentLog High Latency!must wait for a day...

Page 8: The basics of fluentd

After Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

In streaming!

Fluentd Fluentd Fluentd

Fluentd Fluentd

Page 9: The basics of fluentd

Overview

Page 10: The basics of fluentd

> Open sourced log collector written in Ruby

> Reliable, scalable and easy to extend

> Using rubygems ecosystem for plugins

!

!

In short

It’s like syslogd, but uses JSON for log messages

Page 11: The basics of fluentd

tail

insert

event buffering

127.0.0.1 - - [11/Dec/2012:07:26:27] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:26:30] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:26:32] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ...

...

Fluentd

Web Server

2012-02-04 01:33:51 apache.log

{ "host": "127.0.0.1", "method": "GET", ... }

Example (apache to mongo)

Page 12: The basics of fluentd

> default second unit

> from data source oradding parsed time

Event structure(log message)

✓ Time

> for message routing

✓ Tag

> JSON format

> MessagePackinternally

> non-unstructured

✓ Record

Page 13: The basics of fluentd

Pluggable Architecture

Buffer Output

Input

> Forward > HTTP > File tail > dstat > ...

> Forward > File > MongoDB > ...

> File > Memory

Engine

Output

> rewrite > ...

Pluggable Pluggable

Page 14: The basics of fluentd

Fluentd

# Ruby!Fluent.open(“myapp”)!Fluent.event(“login”, {“user” => 38})!#=> 2012-12-11 07:56:01 myapp.login {“user”:38}

> Ruby > Java > Perl > PHP > Python > D > Scala > ...

Application

Time:Tag:Record

Client libraries

Page 15: The basics of fluentd

Configuration and operation

> No central / master node > HTTP include helps configuration sharing

> Operation depends on your environment > Use your deamon management > Use Chef in Treasure Data

> Apache like syntax and Ruby DSL

Page 16: The basics of fluentd

# receive events via HTTP <source> type http port 8888 </source> !# read logs from a file <source> type tail path /var/log/httpd.log format apache tag apache.access </source> !# save access logs to MongoDB <match apache.access> type mongo database apache collection log </match>

# save alerts to a file <match alert.**> type file path /var/log/fluent/alerts </match> !# forward other logs to servers <match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server> </match> !include http://example.com/conf

Page 17: The basics of fluentd

Reliability (core + plugin)

> Buffering > Use file buffer for persistent data > buffer chunk has ID for idempotent

> Retrying

> Error handling > transaction, failover, etc on forward plugin > secondary for backup

Page 18: The basics of fluentd

Plugins - use rubygems

$ fluent-gem search -rd fluent-plugin!

!

$ fluent-gem search -rd fluent-mixin!

!

$ fluent-gem install fluent-plugin-mongo

Page 19: The basics of fluentd

http://www.fluentd.org/plugins

Page 20: The basics of fluentd

in_tail

✓ read a log file!✓ read log files in directory!✓ custom regexp!✓ custom parser in Ruby

FluentdApache

access.log

> apache > apache2 > syslog > nginx

> json > csv > tsv > ltsv

Supported format:> none > multiline

Page 21: The basics of fluentd

Fluentd

out_mongo

Apache

bufferaccess.log

✓ retry automatically!✓ exponential retry wait!✓ persistent on a file

Page 22: The basics of fluentd

Fluentd

out_webhdfs

buffer

✓ retry automatically!✓ exponential retry wait!✓ persistent on a file

✓ slice files based on time2013-01-01/01/access.log.gz!2013-01-01/02/access.log.gz!2013-01-01/03/access.log.gz!...

HDFS

✓ custom text formatter

Apache

access.log

Page 23: The basics of fluentd

out_copy + other plugins

✓ routing based on tags!✓ copy to multiple storages

Amazon S3

HadoopFluentd

buffer

Apache

access.log

Page 24: The basics of fluentd

out_forward

apache

✓ automatic fail-over!✓ load balancing

FluentdApache

bufferaccess.log

✓ retry automatically!✓ exponential retry wait!✓ persistent on a file

Fluentd

Fluentd

Fluentd

Page 25: The basics of fluentd

Forward topology

send/ack

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

Fluentd

send/ack

Page 26: The basics of fluentd

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 27: The basics of fluentd

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 28: The basics of fluentd

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 29: The basics of fluentd

td-agent

> Open sourced distribution package of fluentd

> ETL part of Treasure Data

> deb, rpm, dmg (since td-agent 2.0)

> Including useful components > ruby, jemalloc, fluentd > 3rd party gems: td, mongo, webhdfs, etc…

> http://packages.treasure-data.com/

Page 30: The basics of fluentd

v1

> New features without breaking compatibility

> Filter, Label and better error handling

> Serverengine based: multi-process, signal, etc.

> New configuration and DSL format

> JRuby and Windows support

> github issue: Plan for v1 release #251

Page 31: The basics of fluentd

Use cases

Page 32: The basics of fluentd

Treasure Data

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd

Applications push metrics to Fluentd (via local Fluentd)

Librato Metricsfor realtime analysis

Treasure Data

for historical analysis

Fluentd sums up data minutes(partial aggregation)

Page 33: The basics of fluentd

hundreds of app servers

sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Cookpad

✓ Over 100 RoR servers (2012/2/4)

Page 34: The basics of fluentd

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

NHN Japan

by @tagomoris

✓ 16 nodes!✓ 120,000+ lines/sec!✓ 400Mbps at peak!✓ 1.5+ TB/day (raw)

Web Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers

GraphTools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULED BATCH

Page 35: The basics of fluentd

Other usecases

> Collect censor logs

> Embedded devise, Rapsberry Pi, etc

> Integrated with Elasticsearch and Kibana

> Integrated with Norikra CEP engine

http://www.fluentd.org/guides

Page 36: The basics of fluentd

Other companies

http://www.fluentd.org/testimonials

Page 37: The basics of fluentd

> Fluentd is a widely-used log collector

> There are many use cases

> Many contributors and plugins

> Keep it simple

> Easy to integrate your environment

Conclusion