Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

52
Fluentd MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc. Tuesday, July 17, 2012

description

The presentation given at MongoSV User Group > http://www.meetup.com/MongoDB-SV-User-Group/events/72760092/

Transcript of Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Page 1: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd ♥ MongoDB

Log Everything As JSON

Kazuki Ohta, CTO at Treasure Data, Inc.

Tuesday, July 17, 2012

Page 2: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Self-Introduction• Kazuki Ohta

> twitter: @kzk_mover> github: kzk

• Treasure Data, Inc.> Chief Technology Officer; Founder> Original Fluentd Author @frsyuki is another co-founder.

• Open-Source Enthusiast> KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc.> Fluentd rpm/deb package manager

2

Tuesday, July 17, 2012

Page 3: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Logging? Why?

Tuesday, July 17, 2012

Page 4: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

4

Figure 1: Common Logging Purposes

Analytics

Error Notification

Recommendation

Tuesday, July 17, 2012

Page 5: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

5

Figure 2: Types of Logs

App Log

Access Log(Apache, Rails, etc.)System Log(syslog etc.)Others

Tuesday, July 17, 2012

Page 6: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

From “Scaling Lessons learned at Dropbox”6

Tuesday, July 17, 2012

Page 7: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

From “Scaling Lessons learned at Dropbox”6

Fragile for format change,No type information,No field name, etc.

Tuesday, July 17, 2012

Page 8: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

About Fluentd

Tuesday, July 17, 2012

Page 9: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

8

It's like syslogd, but uses JSON for log messages

Tuesday, July 17, 2012

Page 10: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Logs in JSON? Why?

9

1. Machine-Readable> machine is goint to be a main consumer of logs

2. Schema-Free> you want to add/remove fields from logs at anytime

Write Logs for Machines, use JSONhttp://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/

Tuesday, July 17, 2012

Page 11: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Logs As JSON

10

Logs As TEXT

+ Field Name+ No Custom Parser+ Type Information+ Schema Free

Tuesday, July 17, 2012

Page 12: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Logs As JSON

10

“2011-04-01 host1 myapp: cmessage size=12MB user=me”

2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, “user”: “me”}

Logs As TEXT

+ Field Name+ No Custom Parser+ Type Information+ Schema Free

Tuesday, July 17, 2012

Page 13: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

11

http://fluentd.org/

Tuesday, July 17, 2012

Page 14: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

• Website> http://fluentd.org/

• Community> http://github.com/fluent> 16 committers across

many organizations> web, game, enterprise

• Mailing list> Google groups

12

Tuesday, July 17, 2012

Page 15: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd Architecture

Tuesday, July 17, 2012

Page 16: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

14

Application

Fluentd

Storage

Fluentd: Log Format

Tuesday, July 17, 2012

Page 17: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

14

Application

Fluentd

Storage

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

Fluentd: Log Format

Tuesday, July 17, 2012

Page 18: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

14

Application

Fluentd

Storage

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

timetag

record

Fluentd: Log Format

Tuesday, July 17, 2012

Page 19: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

15

Fluentd: Plugins

Application

Fluentd

Storage

filter / buffer / routing

Tuesday, July 17, 2012

Page 20: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

15

Fluentd: Plugins

Application

Fluentd

FluentdStorageSaaS

filter / buffer / routing

Plug-in Plug-in Plug-in

Tuesday, July 17, 2012

Page 21: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

16

Fluentd: Plugins

Application

Fluentd

FluentdStorageSaaS

filter / buffer / routing

Plug-in Plug-in Plug-in

Tuesday, July 17, 2012

Page 22: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

16

Fluentd: Plugins

Application

Fluentd

FluentdStorageSaaS

filter / buffer / routing

File

tail

Scribesyslogd

Plug-in Plug-in

Plug-in

Plug-in Plug-in Plug-in

Tuesday, July 17, 2012

Page 23: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

17

• Client libraries> Ruby> Perl> PHP> Python> Java> ...

Application

Fluentd

HTTP / TCP / UDS

Buffering

Tuesday, July 17, 2012

Page 24: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

17

• Client libraries> Ruby> Perl> PHP> Python> Java> ...

Fluent.open(“myapp”)

Fluent.event(“login”, {“user”=>38})

#=> 2012-02-04 04:56:01 myapp.login {“user”:38}

Application

Fluentd

HTTP / TCP / UDS

Buffering

Tuesday, July 17, 2012

Page 25: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Typical Log Collection by `rsync`

18

Burst of trafficrsync consumesall bandwidth

Tuesday, July 17, 2012

Page 26: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Typical Log Collection by `rsync`

18

Application

File File File ...

App server

Application

File File File ...

App server

File

Application

File File File ...

App server

Log server

Burst of trafficrsync consumesall bandwidth

High latencymust wait for a day

Hard to analyzecomplex text parsers

Tuesday, July 17, 2012

Page 27: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Log Collection using Fluentd

19

Fluentd Fluentd Fluentd

Fluentd Fluentd

Realtime!

Tuesday, July 17, 2012

Page 28: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Log Collection using Fluentd

19

Fluentd Fluentd Fluentd

Fluentd Fluentd

Hadoop/ Hive

MongoDB

AmazonS3 / EMR

Ready toAnalyze!

Realtime!

Tuesday, July 17, 2012

Page 29: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd Case Study

20

Fluentd Fluentd

Fluentd Fluentd Fluentd

Ruby on Rails Ruby on Rails Ruby on Rails

Hadoop/ Hive

MongoDBPV logs

User behaviorlogs

routing✓ 127 RoR servers✓ 100,000 msgs/sec✓ 120Mbps at peak✓ 1TB/day

Tuesday, July 17, 2012

Page 30: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>

# save access logs to MongoDB<match apache.access> type mongo host 127.0.0.1</match>

# forward other logs to servers# (load-balancing + fail-over)<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>

Tuesday, July 17, 2012

Page 31: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Comparison

Tuesday, July 17, 2012

Page 32: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

23

Frontend servers

Aggregator nodesscribe

scribescribe

scribe

scribescribe

HadoopHDFS

Scribe: log collector by Facebook

Tuesday, July 17, 2012

Page 33: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Scribe’s Pros & Cons• Pros.

• Fast (written in C++)• Cons.

• VERY HARD to install• nightmare of boost, thrift, libhdfs, etc.

• Unstructured Logs• parsing must be required before the analysis

• Hard to extend• recompiling C++ programs are required

• No longer maintained

24

Tuesday, July 17, 2012

Page 34: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd vs Scribe• Easy to install

• “gem install fluentd”• Stable RPM and Deb packages

• http://packages.treasure-data.com/• Easy to write plugins

• you can use Ruby• Easy plugin distribution

• “gem search -rd fluent-plugin”

25

Tuesday, July 17, 2012

Page 35: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

26

Flume: distributed log collector by Cloudera

Flume

HadoopHDFS

Flume Flume

Flume MasterPhisicalTopology

LogicalTopology

Tuesday, July 17, 2012

Page 36: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Flume’s Pros & Cons• Pros.

• Central master server manages all nodes• Cons.

• Difficult to understand• logical topologies, phisical servers and a

configuration of the logical/phisical mapping

• Difficult to configure• replicated master servers, log servers and agents

• Big footprint• 50,000 lines of Java

27

Tuesday, July 17, 2012

Page 37: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd vs Flume

• Easy to understand• “syslogd that understands JSON”

• Easy to setup• “sudo fluentd --setup && fluentd”

• Very small footprint• small engine (3,000) lines + plugins• small, but battle-tested!

• Easy to configure

28

Tuesday, July 17, 2012

Page 38: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

29

Fluentd Scribe FlumeInstallation

Footprint

Plugin

Plugin distribution

Master Server

License

gem/rpm/deb make jar/rpm/deb

3000 lines ofRuby

8000 lines ofC++

50,000 lines ofJava

Ruby N/A Java

RubyGems.org N/A N/A

No No Yes

Apache License Apache License Apache License

Tuesday, July 17, 2012

Page 39: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd Plugin for

Tuesday, July 17, 2012

Page 40: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

fluent-plugin-mongo• Included within rpm/deb by default!

• http://github.com/fluent/fluent-plugin-mongo

• #1 plugin among 50+ Fluentd plugins

• Logs As JSON. WHY NOT Put Them Into Mongo??

• http://fluentd.org/plugin/• Supports most of the MongoDB features

• Authentication

• ReplicaSet

• Capped Collection

31

Tuesday, July 17, 2012

Page 41: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

32

Application

Fluentd

MongoDB MongoDB

MongoDBMongoDB

MongoDB

ShardingReplicaSet

Single Instance(Capped or Not)

• MongoDB Output Plugin• Maintain JSON Structure• Reliable Buffering• Batch Insertion• Handle Broken Records

• Ruby Driver #82

Buffering

Authentication

MongoDBMongoDB

MongoDBMongoDBMongoDB

Tuesday, July 17, 2012

Page 42: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

32

Application

Fluentd

MongoDB MongoDB

MongoDBMongoDB

MongoDB

ShardingReplicaSet

Single Instance(Capped or Not)

• MongoDB Output Plugin• Maintain JSON Structure• Reliable Buffering• Batch Insertion• Handle Broken Records

• Ruby Driver #82

Buffering

Authentication

MongoDBMongoDB

MongoDBMongoDBMongoDB

Tuesday, July 17, 2012

Page 43: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

33

Fluentd

MongoDB

• MongoDB Input Plugin• Tailing Capped Collections

Buffering

MongoDB

MongoDBMongoDB

ReplicaSet(Capped Collection)

Single Instance(Capped Collection)

Authentication

Tuesday, July 17, 2012

Page 44: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

33

Fluentd

MongoDB

• MongoDB Input Plugin• Tailing Capped Collections

Buffering

MongoDB

MongoDBMongoDB

ReplicaSet(Capped Collection)

Single Instance(Capped Collection)

Authentication

Tuesday, July 17, 2012

Page 45: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

34

Realtime Analytics with Fluentd + MongoDB

Fluentd Fluentd

Fluentd Fluentd Fluentd

App App App

MongoDB

routing

ChartingqueryAlert

Nagios, Zabbix, etc.

Tuesday, July 17, 2012

Page 46: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

35

Realtime or Batch? No, BOTH!

Fluentd Fluentd

Fluentd Fluentd Fluentd

App App App

MongoDB

routing

Chartingquery

realtime

AmazonS3

Hadoop/ Hive

archivebatchTuesday, July 17, 2012

Page 47: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

36

Intro of our company’s service: Treasure Data

Fluentd Fluentd

Fluentd Fluentd Fluentd

App App App

MongoDB

routing

realtime

TreasureDatabatch

Hadoop-basedCloud Data Warehouse

Tuesday, July 17, 2012

Page 48: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Exercise: Apache Logs into MongoDB

Tuesday, July 17, 2012

Page 49: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

38

Log File

Tuesday, July 17, 2012

Page 50: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

39

Tuesday, July 17, 2012

Page 51: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

40

Tuesday, July 17, 2012

Page 52: Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Conclusion

• Log Everything as JSON• Machine Readability• Schema Freeness

• MongoDB fits into Fluentd’s backend perfectly• Both using JSON representation

41

Tuesday, July 17, 2012