Fluentd - Flexible, Stable, Scalable

35
Fluentd Flexible, Stable, Scalable Suiting @Taipei.py

Transcript of Fluentd - Flexible, Stable, Scalable

FluentdFlexible, Stable, Scalable

Suiting @Taipei.py

ho  am  I

Suiting  (@suitingtseng)  

Gogolook  Inc.  

Data  Team

Before

What is Fluentd?

• Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

• Treasure Data: td-agent

What is Fluentd?

• Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

• Treasure Data: td-agent

What is a log?

Log definition

Time + Tag + Content

After

How?

• Lightweight: C + Ruby + MessagePack

• Pluggable architecture

• Built-in Reliability

Input plugins

• forward

• tail

• AWS Simple Queue Service

• AWS CloudWatch

input: tail$  cat  /etc/td-­‐agent/conf.d  

<source>  

   type            tail  

   path            /var/log/nginx/access.log  

   pos_file    /var/log/td-­‐agent/httpd-­‐access.log.pos  

   tag              nginx.access  

</source>  

<match  nginx.access>  

   blah  blah  

</match>

input: forward$  cat  /etc/td-­‐agent/conf.d  

<source>  

   type  forward  

   port  24224  

</source>  

<match  flask.index>  

   blah  blah  

</match>

input: forward$  cat  ~/example.py  

from  fluent  import  sender  

from  fluent  import  event  

sender.setup('flask',  host='localhost',  port=24224)  

event.Event("index",  {  

"user":  "foo",  

"token":  "bar",  

"action":  "POST"  

})

Output plugins

• forward

• copy

• Elasticsearch / MongoDB

• statsd / influxDB / graphite

• S3 / GCS / BigQuery

output: elasticsearch$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                        bar  

   tag                        nginx.access  

</source>  

<match  nginx.access>  

   type                      elasticsearch  

   hosts                    es-­‐host1,es-­‐host2  

   index_name          nginx  

   type_name            access  

   flush_interval  60s  

</match>

output: splunk$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                        bar  

   tag                        nginx.access  

</source>  

<match  nginx.access>  

   type                      splunk  

   hosts                    splunk-­‐host1  

</match>

Filter plugins

• grok

• grep

• record-modifier / record-reformer

• geoip

Buffer types

• Memory

• File

Buffer example$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                                  bar  

   tag                                  nginx.access  

</source>  

<match  nginx.access>  

   type                                splunk  

   hosts                              splunk-­‐host1  

   buffer_chunk_limit    10m  

   buffer_queue_limit    1000  

   flush_interval            5m  

</match>

Scalability

• Scale up: multi-process plugin

• Scale out: out-forward plugin

App + Fluentd

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

App + Fluentd

App + Fluentd

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

Fluentd

App + Fluentd

App + Fluentd

App + Fluentd

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

Fluentd

FluentdLoad

balance

App + Fluentd

App + Fluentd

App + Fluentd

Auto scaling group

Stability

• Auto retry

• Persistent file buffer

• At-most-once delivery

Message Delivery

• At-most-once: data may be lost

• At-least-once: data may be duplicated

• Exactly-once: perfect

Idempotent

• HTTP PUT

• Maintain a unique id in application level or

• Concatenate (instance-id, time, ….) as id

Gogolook use cases

• MongoDB, nginx log

• API, worker log

• Monitor

• Benchmark

Active users by day

System monitor

Queue monitor

Benchmark?

FluentdApp + Fluentd DB

Benchmark?

FluentdApp + Fluentd DB

Local files

Benchmark?

FluentdApp + Fluentd DB

Local files

Q & A