Fluentd Unified Logging Layer At Fossasia

42
Masahiro Nakagawa Mar 14, 2015 Fossasia 2015 Fluentd Unified logging layer

Transcript of Fluentd Unified Logging Layer At Fossasia

Page 1: Fluentd Unified Logging Layer At Fossasia

Masahiro NakagawaMar 14, 2015

Fossasia 2015

FluentdUnified logging layer

Page 2: Fluentd Unified Logging Layer At Fossasia

Who am I

> Masahiro Nakagawa > github: @repeatedly

> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer

> Living at OSS :) > D language - Phobos, a.k.a standard library, committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc…) > etc…

Page 3: Fluentd Unified Logging Layer At Fossasia

Structured logging !

Reliable forwarding !

Pluggable architecture

http://fluentd.org/

github:fluent/fluentd

Page 4: Fluentd Unified Logging Layer At Fossasia

What’s Fluentd?

> Data collector for unified logging layer > Streaming data transfer based on JSON > Simple core + plugins written in Ruby

> Gem based various plugins > http://www.fluentd.org/plugins

> List of users > http://www.fluentd.org/testimonials

Page 5: Fluentd Unified Logging Layer At Fossasia

Before

✓ duplicated code for error handling... ✓ messy code for retrying mechanism...

Page 6: Fluentd Unified Logging Layer At Fossasia

So painful!

Page 7: Fluentd Unified Logging Layer At Fossasia

After

Page 8: Fluentd Unified Logging Layer At Fossasia

Concept / Design

Page 9: Fluentd Unified Logging Layer At Fossasia

Core Plugins

> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism

> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

Page 10: Fluentd Unified Logging Layer At Fossasia

Core Plugins

> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism

> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

Common Concerns

Use Case Specific

Page 11: Fluentd Unified Logging Layer At Fossasia

> default second unit

> from data source

Event structure(log message)

✓ Time

> for message routing

> where is from?

✓ Tag

> JSON format

> MessagePackinternally

> schema-free

✓ Record

Page 12: Fluentd Unified Logging Layer At Fossasia

Reliable streaming data transfer

error retry

error retry retry

retryBatch

Stream

Other stream

(micro batch)

Page 13: Fluentd Unified Logging Layer At Fossasia

Nagios

PostgreSQL

Hadoop

Alerting

Amazon S3

Analysis

Archiving

Elasticsearch

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesbuffering / retrying / routing

M x N → M + N

plugins

Page 14: Fluentd Unified Logging Layer At Fossasia

Use case

Page 15: Fluentd Unified Logging Layer At Fossasia

Simple forwarding

Page 16: Fluentd Unified Logging Layer At Fossasia

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache</source>!# logs from client libraries<source> type forward port 24224</source>!

# store logs to MongoDB<match backend.*> type mongo database fluent collection test</match>

Page 17: Fluentd Unified Logging Layer At Fossasia

Less Simple Forwarding

- At-most-once / At-least-once - HA (failover) - Load-balancing

Page 18: Fluentd Unified Logging Layer At Fossasia

All data

Near realtime and batch combo!

Hot data

Page 19: Fluentd Unified Logging Layer At Fossasia

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access</source>!# logs from client libraries<source> type forward port 24224</source>!

# store logs to ES and HDFS<match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>

Page 20: Fluentd Unified Logging Layer At Fossasia

CEP for Stream Processing

Norikra is a SQL based CEP engine: http://norikra.github.io/

Page 21: Fluentd Unified Logging Layer At Fossasia

Container Logging

Page 22: Fluentd Unified Logging Layer At Fossasia

> Kubernetes

!

!

!

!

!

> Google Compute Engine > https://cloud.google.com/logging/docs/install/compute_install

Fluentd on Kubernetes / GCE

Page 23: Fluentd Unified Logging Layer At Fossasia

Slideshare

http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/

Page 24: Fluentd Unified Logging Layer At Fossasia

Log Analysis System And its designs in LINE Corp. 2014 early

Page 25: Fluentd Unified Logging Layer At Fossasia

Architecture

Page 26: Fluentd Unified Logging Layer At Fossasia

Internal Architecture

Input Parser Buffer Output FormatterFilter OutputFormatter

Page 27: Fluentd Unified Logging Layer At Fossasia

Internal Architecture

Input Parser Buffer Output FormatterFilter

“input-ish” “output-ish”

Page 28: Fluentd Unified Logging Layer At Fossasia

Input plugins

File tail (in_tail) Syslog (in_syslog) HTTP (in_http) HTTP/2 (in_http2 WIP) ...

✓ Receive logs

✓ Or pull logs from data sources

✓ non-blocking

InpuInput

Page 29: Fluentd Unified Logging Layer At Fossasia

Parser plugins

JSON Regexp Apache/Nginx/Syslog CSV/TSVetc.

✓ Parse into JSON

✓ Common formats out of the box

✓ Some inputs plugin depends on

Parser plugin

✓ v0.10.46 and above

ParseParser

Page 30: Fluentd Unified Logging Layer At Fossasia

Filter plugins

grep record_transformer suppress …

✓ Filter / Mutate record

✓ Record level and Stream level

✓ v0.12 and above

ParseParserFilter

Page 31: Fluentd Unified Logging Layer At Fossasia

Buffer plugins

✓ Improve performance

✓ Provide reliability

✓ Provide thread-safetyMemory (buf_memory) File (buf_file)

BuffeBuffer

Page 32: Fluentd Unified Logging Layer At Fossasia

Buffer internal

✓ Chunk = adjustable unit of data

✓ Buffer = Queue of chunks

chunk

chunk

chunk output

Input

Page 33: Fluentd Unified Logging Layer At Fossasia

Formatter plugins

✓ Format output

✓ Some plugins depends on

Formatter plugins

✓ v0.10.46 and aboveJSON CSV/TSV “single value” msgpack

FormattFormatter

Page 34: Fluentd Unified Logging Layer At Fossasia

Output plugins

✓ Write to external systems

✓ Buffered & Non-buffered

✓ 200+ plugins

Outpu

File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...

Output

Page 35: Fluentd Unified Logging Layer At Fossasia

Roadmap> v0.10 (old stable) > v0.12 (current stable)

> Filter / Label / At-least-once > v0.14 (spring, 2015)

> New plugin APIs, ServerEngine, Time… > v1 (summer, 2015)

> Fix new features / APIs

https://github.com/fluent/fluentd/wiki/V1-Roadmap

Page 36: Fluentd Unified Logging Layer At Fossasia

Goodies

Page 37: Fluentd Unified Logging Layer At Fossasia

fluent-bit> Made for Embedded Linux

> OpenEmbedded & Yocto Project > Intel Edison, RasPi & Beagle Black boards > https://github.com/fluent/fluent-bit

> Standalone application or Library mode > Built-in plugins

> input: cpu, kmsg, output: fluentd > First release at the end of Mar 2015

Page 38: Fluentd Unified Logging Layer At Fossasia

fluentd-ui

> Manage Fluentd instance via Web UI > https://github.com/fluent/fluentd-ui

Page 39: Fluentd Unified Logging Layer At Fossasia

Treasure Agent (td-agent)

> Treasure Data distribution of Fluentd > including Ruby and QA’ed plugins

> Treasure Agent 2 is current stable > We recommend to use v2, not v1 > including fluentd-ui

> Next release, 2.2.0, uses fluentd v0.12

Page 40: Fluentd Unified Logging Layer At Fossasia

Embulk

> Bulk Loader version of Fluentd > Pluggable architecture

> JRuby, JVM languages > High performance parallel processing

> Share your script as a plugin > https://github.com/embulk

http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed

Page 41: Fluentd Unified Logging Layer At Fossasia

HDFS

MySQL

Amazon S3

Embulk

CSV Files

SequenceFile

Salesforce.com

Elasticsearch

Cassandra

Hive

Redis

✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying

Plugins Plugins

bulk load

Page 42: Fluentd Unified Logging Layer At Fossasia

Check: treasuredata.comCloud service for the entire data pipeline