Fluentd Unified Logging Layer At Fossasia

Post on 14-Jul-2015

1.542 views 2 download

Tags:

Transcript of Fluentd Unified Logging Layer At Fossasia

Masahiro NakagawaMar 14, 2015

Fossasia 2015

FluentdUnified logging layer

Who am I

> Masahiro Nakagawa > github: @repeatedly

> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer

> Living at OSS :) > D language - Phobos, a.k.a standard library, committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc…) > etc…

Structured logging !

Reliable forwarding !

Pluggable architecture

http://fluentd.org/

github:fluent/fluentd

What’s Fluentd?

> Data collector for unified logging layer > Streaming data transfer based on JSON > Simple core + plugins written in Ruby

> Gem based various plugins > http://www.fluentd.org/plugins

> List of users > http://www.fluentd.org/testimonials

Before

✓ duplicated code for error handling... ✓ messy code for retrying mechanism...

So painful!

After

Concept / Design

Core Plugins

> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism

> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

Core Plugins

> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism

> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

Common Concerns

Use Case Specific

> default second unit

> from data source

Event structure(log message)

✓ Time

> for message routing

> where is from?

✓ Tag

> JSON format

> MessagePackinternally

> schema-free

✓ Record

Reliable streaming data transfer

error retry

error retry retry

retryBatch

Stream

Other stream

(micro batch)

Nagios

PostgreSQL

Hadoop

Alerting

Amazon S3

Analysis

Archiving

Elasticsearch

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesbuffering / retrying / routing

M x N → M + N

plugins

Use case

Simple forwarding

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache</source>!# logs from client libraries<source> type forward port 24224</source>!

# store logs to MongoDB<match backend.*> type mongo database fluent collection test</match>

Less Simple Forwarding

- At-most-once / At-least-once - HA (failover) - Load-balancing

All data

Near realtime and batch combo!

Hot data

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access</source>!# logs from client libraries<source> type forward port 24224</source>!

# store logs to ES and HDFS<match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>

CEP for Stream Processing

Norikra is a SQL based CEP engine: http://norikra.github.io/

Container Logging

> Kubernetes

!

!

!

!

!

> Google Compute Engine > https://cloud.google.com/logging/docs/install/compute_install

Fluentd on Kubernetes / GCE

Slideshare

http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/

Log Analysis System And its designs in LINE Corp. 2014 early

Architecture

Internal Architecture

Input Parser Buffer Output FormatterFilter OutputFormatter

Internal Architecture

Input Parser Buffer Output FormatterFilter

“input-ish” “output-ish”

Input plugins

File tail (in_tail) Syslog (in_syslog) HTTP (in_http) HTTP/2 (in_http2 WIP) ...

✓ Receive logs

✓ Or pull logs from data sources

✓ non-blocking

InpuInput

Parser plugins

JSON Regexp Apache/Nginx/Syslog CSV/TSVetc.

✓ Parse into JSON

✓ Common formats out of the box

✓ Some inputs plugin depends on

Parser plugin

✓ v0.10.46 and above

ParseParser

Filter plugins

grep record_transformer suppress …

✓ Filter / Mutate record

✓ Record level and Stream level

✓ v0.12 and above

ParseParserFilter

Buffer plugins

✓ Improve performance

✓ Provide reliability

✓ Provide thread-safetyMemory (buf_memory) File (buf_file)

BuffeBuffer

Buffer internal

✓ Chunk = adjustable unit of data

✓ Buffer = Queue of chunks

chunk

chunk

chunk output

Input

Formatter plugins

✓ Format output

✓ Some plugins depends on

Formatter plugins

✓ v0.10.46 and aboveJSON CSV/TSV “single value” msgpack

FormattFormatter

Output plugins

✓ Write to external systems

✓ Buffered & Non-buffered

✓ 200+ plugins

Outpu

File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...

Output

Roadmap> v0.10 (old stable) > v0.12 (current stable)

> Filter / Label / At-least-once > v0.14 (spring, 2015)

> New plugin APIs, ServerEngine, Time… > v1 (summer, 2015)

> Fix new features / APIs

https://github.com/fluent/fluentd/wiki/V1-Roadmap

Goodies

fluent-bit> Made for Embedded Linux

> OpenEmbedded & Yocto Project > Intel Edison, RasPi & Beagle Black boards > https://github.com/fluent/fluent-bit

> Standalone application or Library mode > Built-in plugins

> input: cpu, kmsg, output: fluentd > First release at the end of Mar 2015

fluentd-ui

> Manage Fluentd instance via Web UI > https://github.com/fluent/fluentd-ui

Treasure Agent (td-agent)

> Treasure Data distribution of Fluentd > including Ruby and QA’ed plugins

> Treasure Agent 2 is current stable > We recommend to use v2, not v1 > including fluentd-ui

> Next release, 2.2.0, uses fluentd v0.12

Embulk

> Bulk Loader version of Fluentd > Pluggable architecture

> JRuby, JVM languages > High performance parallel processing

> Share your script as a plugin > https://github.com/embulk

http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed

HDFS

MySQL

Amazon S3

Embulk

CSV Files

SequenceFile

Salesforce.com

Elasticsearch

Cassandra

Hive

Redis

✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying

Plugins Plugins

bulk load

Check: treasuredata.comCloud service for the entire data pipeline