Fluentd Unified Logging Layer At Fossasia

download Fluentd Unified Logging Layer At Fossasia

of 42

  • date post

    14-Jul-2015
  • Category

    Technology

  • view

    1.533
  • download

    0

Embed Size (px)

Transcript of Fluentd Unified Logging Layer At Fossasia

  • Masahiro NakagawaMar 14, 2015

    Fossasia 2015

    FluentdUnified logging layer

  • Who am I

    > Masahiro Nakagawa> github: @repeatedly

    > Treasure Data, Inc.> Senior Software Engineer> Fluentd / td-agent developer

    > Living at OSS :)> D language - Phobos, a.k.a standard library, committer> Fluentd - Main maintainer> MessagePack / RPC - D and Python (only RPC)> The organizer of several meetups (Presto, DTM, etc)> etc

  • Structured logging

    !

    Reliable forwarding

    !

    Pluggable architecture

    http://fluentd.org/github:fluent/fluentd

  • Whats Fluentd?> Data collector for unified logging layer

    > Streaming data transfer based on JSON> Simple core + plugins written in Ruby

    > Gem based various plugins> http://www.fluentd.org/plugins

    > List of users> http://www.fluentd.org/testimonials

  • Before

    duplicated code for error handling... messy code for retrying mechanism...

  • So painful!

  • After

  • Concept / Design

  • Core Plugins

    > Divide & Conquer

    > Buffering & Retrying

    > Error handling

    > Message routing

    > Parallelism

    > Read / receive data> Parse data> Filter data> Buffer data> Format data> Write / send data

  • Core Plugins

    > Divide & Conquer

    > Buffering & Retrying

    > Error handling

    > Message routing

    > Parallelism

    > Read / receive data> Parse data> Filter data> Buffer data> Format data> Write / send data

    Common

    Concerns

    Use Case

    Specific

  • > default second unit

    > from data source

    Event structure(log message)

    Time

    > for message routing

    > where is from?

    Tag

    > JSON format

    > MessagePackinternally

    > schema-free

    Record

  • Reliable streaming data transfer

    error retry

    error retry retry

    retryBatch

    Stream

    Other stream

    (micro batch)

  • Nagios

    PostgreSQL

    Hadoop

    Alerting

    Amazon S3

    Analysis

    Archiving

    Elasticsearch

    Apache

    Frontend

    Access logs

    syslogd

    App logs

    System logs

    Backend

    Databasesbuffering / retrying / routing

    M x N M + N

    plugins

  • Use case

  • Simple forwarding

  • # logs from a file type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache!# logs from client libraries type forward port 24224!

    # store logs to MongoDB type mongo database fluent collection test

  • Less Simple Forwarding

    - At-most-once / At-least-once - HA (failover)

    - Load-balancing

  • All data

    Near realtime and batch combo!

    Hot data

  • # logs from a file type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access!# logs from client libraries type forward port 24224!

    # store logs to ES and HDFS type copy type elasticsearch logstash_format true type webhdfs host namenode port 50070 path /path/on/hdfs/

  • CEP for Stream Processing

    Norikra is a SQL based CEP engine: http://norikra.github.io/

  • Container Logging

  • > Kubernetes!

    !

    !

    !

    !

    > Google Compute Engine> https://cloud.google.com/logging/docs/install/compute_install

    Fluentd on Kubernetes / GCE

  • Slideshare

    http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/

  • Log Analysis System And its designs in LINE Corp. 2014 early

  • Architecture

  • Internal Architecture

    Input Parser Buffer Output FormatterFilter OutputFormatter

  • Internal Architecture

    Input Parser Buffer Output FormatterFilter

    input-ish output-ish

  • Input plugins

    File tail (in_tail)Syslog (in_syslog)HTTP (in_http)HTTP/2 (in_http2 WIP)...

    Receive logs

    Or pull logs from data sources

    non-blocking

    InpuInput

  • Parser plugins

    JSONRegexpApache/Nginx/SyslogCSV/TSVetc.

    Parse into JSON

    Common formats out of the box

    Some inputs plugin depends on

    Parser plugin

    v0.10.46 and above

    ParseParser

  • Filter plugins

    greprecord_transformersuppress

    Filter / Mutate record

    Record level and Stream level

    v0.12 and above

    ParseParserFilter

  • Buffer plugins

    Improve performance

    Provide reliability

    Provide thread-safetyMemory (buf_memory)File (buf_file)

    BuffeBuffer

  • Buffer internal

    Chunk = adjustable unit of data

    Buffer = Queue of chunks

    chunk

    chunk

    chunk output

    Input

  • Formatter plugins

    Format output

    Some plugins depends on

    Formatter plugins

    v0.10.46 and aboveJSONCSV/TSVsingle valuemsgpack

    FormattFormatter

  • Output plugins

    Write to external systems

    Buffered & Non-buffered

    200+ plugins

    Outpu

    File (out_file)Amazon S3 (out_s3)MongoDB (out_mongo)...

    Output

  • Roadmap> v0.10 (old stable)> v0.12 (current stable)

    > Filter / Label / At-least-once> v0.14 (spring, 2015)

    > New plugin APIs, ServerEngine, Time> v1 (summer, 2015)

    > Fix new features / APIshttps://github.com/fluent/fluentd/wiki/V1-Roadmap

  • Goodies

  • fluent-bit> Made for Embedded Linux

    > OpenEmbedded & Yocto Project> Intel Edison, RasPi & Beagle Black boards> https://github.com/fluent/fluent-bit

    > Standalone application or Library mode> Built-in plugins

    > input: cpu, kmsg, output: fluentd> First release at the end of Mar 2015

  • fluentd-ui

    > Manage Fluentd instance via Web UI> https://github.com/fluent/fluentd-ui

  • Treasure Agent (td-agent)

    > Treasure Data distribution of Fluentd> including Ruby and QAed plugins

    > Treasure Agent 2 is current stable> We recommend to use v2, not v1> including fluentd-ui

    > Next release, 2.2.0, uses fluentd v0.12

  • Embulk

    > Bulk Loader version of Fluentd> Pluggable architecture

    > JRuby, JVM languages> High performance parallel processing

    > Share your script as a plugin> https://github.com/embulkhttp://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed

  • HDFS

    MySQL

    Amazon S3

    Embulk

    CSV Files

    SequenceFile

    Salesforce.com

    Elasticsearch

    Cassandra

    Hive

    Redis

    Parallel execution Data validation Error recovery Deterministic behaviour Idempotent retrying

    Plugins Plugins

    bulk load

  • Check: treasuredata.comCloud service for the entire data pipeline