Post on 08-May-2015
Sadayuki Furuhashi
Fluentd
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
Fluentd in brief
It's like syslogd, but uses JSON for log messages
Fluentd :: format of logs
Application
Fluentd
Storage
2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
Fluentd :: format of logs
Application
Fluentd
Storage
2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
timetag
record
Fluentd :: plugins
Application
Fluentd
FluentdStorageSaaS
!lter / bu"er / routing
Plug-in Plug-in Plug-in
Fluentd :: plugins
Application
Fluentd
FluentdStorageSaaS
!lter / bu"er / routing
File
tail
Scribesyslogd
Plug-in Plug-in
Plug-in
Plug-in Plug-in Plug-in
Fluentd :: client libraries• Client libraries
> Ruby> Perl> PHP> Python> Java> ...
Fluent.open(“myapp”)
Fluent.event(“login”, {“user”=>38})
#=> 2012-02-04 04:56:01 myapp.login {“user”:38}
Application
Fluentd
Typical architecture before Fluentd
Application
File File File ...
App server
Application
File File File ...
App server
File
Application
File File File ...
App server
Log server
Burst of tra!c
High latencymust wait for a day
Hard to analyzecomplex text parsers
Architecture after Fluentd
Application
App server
Fluentd
Application
App server
Fluentd
Application
App server
Fluentd
Fluentd Fluentd
Realtime!
Architecture after Fluentd
Fluentd Fluentd Fluentd
Fluentd Fluentd
Hadoop/ Hive MongoDB Amazon
S3 / EMRReady toAnalyze!
Realtime!
Fluentd Fluentd
Fluentd Fluentd Fluentd
Case studyRuby on Rails Ruby on Rails Ruby on Rails
Hadoop/ Hive MongoDBPV logs
User behaviorlogs
routing✓ 127 RoR servers✓ 70,000 msgs/sec✓ 120Mbps at peak✓ 650GB/day
# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>
# save access logs to MongoDB<match apache.access> type mongo host 127.0.0.1</match>
# forward other logs to servers# (load-balancing + fail-over)<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>
Scribe
Frontend servers
Aggregator nodesscribe
scribescribe
scribe
scribescribe
HadoopHDFS
Scribe: log collector by Facebook
Scribe’s Pros & Cons• Pros.
> Fast (C++)
• Cons.> VERY hard to install> Deals with unstructured logs you must parse logs before analyzing them
> Hard to extend you must re-compile C++ programs
> No longer maintained?
Fluentd vs Scribe
• Easy to install> “gem install fluentd”> stable RPM and DEB packages http://packages.treasure-data.com/
• Easy to write plugins> you can use Ruby
• Easy to distribute plugins> “gem search -rd fluent-plugin”
FlumeFlume: distributed log collector by Cloudera
Flume
HadoopHDFS
Flume Flume
Flume MasterPhisicalTopology
LogicalTopology
Flume’s Pros & Cons• Pros.
> Central master server manages all nodes
• Cons.> Difficult to understand logical topologies, phisical servers and a configuration of
the logical/phisical mapping> Dificult to configure replicated master servers, log servers and agents
> Big footprint 50,000 lines of Java codes
Fluentd vs Flume
• Easy to understand> “syslogd that understands JSON”
• Easy to setup> “sudo fluentd --setup && fluentd”
• Very small footprint> small engine (3,000 lines) + plugins
• Easy to configure
Fluentd vs Scribe/FlumeFluentd Scribe Flume
Installation
Footprint
Plugin
Plugin distribution
Master Server
License
gem/rpm/deb make rpm/deb
3000 lines ofRuby
8000 lines ofC++
50,000 lines ofJava
Ruby N/A Java
RubyGems.org N/A N/A
No No Yes
Apache License Apache License Apache License
Fluentd
• Documents> http://fluentd.org
• Source code> http://github.com/fluent> 14 committers across
many organizations
• Mailing list> Google groups
• Sadayuki Furuhashi> twitter: @frsyuki
• Treasure Data, Inc.> Software Engineer; founder
• Author of MessagePack
• Author of Fluentd