Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver...

39
Distributed Logging Architecture in Container Era LinuxCon Japan 2016 at Jun 13 2016 Satoshi "Moris" Tagomori (@tagomoris)

Transcript of Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver...

Page 1: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Distributed Logging Architecture in Container Era

LinuxCon Japan 2016 at Jun 13 2016

Satoshi "Moris" Tagomori (@tagomoris)

Page 2: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

Page 3: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Topics• Microservices and logging in various industries

• Difficulties of logging with containers

• Distributed logging architecture

• Patterns of distributed logging architecture

• Case Study: Docket and Fluentd

• Why OSS are important for logging

Page 4: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Logging

Page 5: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Logging in Various Industries

• Web access logs • Views/visitors on media • Views/clicks on Ads

• Commercial transactions (EC, Game, ...)

• Data from devices • Operation logs on Apps of phones • Various sensor data

Page 6: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Microservices and Logging

• Monolithic service • a service produces all data

about an users access

• Microservices • many services produce data

about an users access • it's needed to collect logs

from many services to know what is happening

Users

Service (Application)

Logs

Users

Logs

Page 7: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Logging and Containers

Page 8: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Containers: "a must" for microservices

• Dividing a service into services • a service requires less computing resources

(VM -> containers)

• Making services independent from each other • but it is very difficult :( • some dependency must be solved even in

development environment(containers on desktop)

Page 9: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Redesign Logging: Why?

• No permanent storages

• No fixed physical/network address

• No fixed mapping between servers and roles

Page 10: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Containers: immutable & disposable

• No permanent storages

• Where to write logs? • file in container

→ be gone w/ container instance 😞 • dir shared from host

→ hosts are shared by many services ☹

• TODO: ship logs from container to anywhere ASAP

Page 11: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Containers: unfixed addresses

• No fixed physical / network address

• Where should we go to fetch logs? • Service discovery (e.g., consul)

→ one more component 😞 • rsync? ssh+tail? or ..? Is it installed in container?

→ one more tool to depend on ☹

• TODO: push logs to anywhere from containers

Page 12: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Containers: instances per roles

• No fixed mapping between servers and roles

• How can we parse / store these logs? • Central repository about log syntax

→ very hard to maintain 😞 • Label logs by source address

→ many containers/roles in a host ☹

• TODO: label & parse logs at source of logs

Page 13: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Distributed Logging Architecture

Page 14: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Core Architecture

• Collector nodes

• Aggregator nodes

• Destination

Collector nodes(Docker containers + agent)

Destination(Storage, Database, ...)

Aggregator nodes

Page 15: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

• Parse (collector) • Raw logs are not good for processing • Convert logs to structured data (key-value pairs)

• Sort/Shuffle (aggregator) • Mixed logs are not good for scanning • Split whole data stream into streams

• Store (destination) • Format logs(records) as destination expects

Collecting and Storing Data

Page 16: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Scaling Logging• Network traffic

• CPU load to parse / format • Parse logs on each collector (distributed) • Format logs on aggregator (to be distributed)

• Capability • Make aggregators redundant

• Controlling delay

Page 17: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Patterns

Page 18: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

source aggregationNO

source aggregationYES

destinationaggregation

NO

destinationaggregation

YES

Aggregation Patterns

Page 19: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Source Side Aggregation Patterns

w/o source aggregation w/ source aggregation

collector

aggregator

aggregate container

Page 20: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Without Source Aggregation

• Pros: • Simple configuration

• Cons: • fixed aggregator (endpoint) address • many network connections • high load in aggregator

collector

aggregator

Page 21: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

With Source Aggregation

• Pros: • less connections • lower load in aggregator • less configuration in containers

(by specifying localhost) • highly flexible configuration

(by deployment only for aggregate containers)

• Cons: • a bit much resource (+1 container per host)

aggregate container

Page 22: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Destination Side Aggregation Patterns

w/o destination aggregation w/ destination aggregation

aggregator

collector

destination

Page 23: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Without Destination Aggregation

• Pros: • Less nodes • Simpler configuration

• Cons: • Storage side change affects collector side • Worse performance: many small write requests

on storage

Page 24: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

With Destination Aggregation

• Pros: • Collector side configuration is

free from storage side changes • Better performance with fine tune

on destination side aggregator

• Cons: • More nodes • A bit complex configuration

aggregator

Page 25: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Scaling PatternsScaling Up Endpoints

HTTP/TCP load balancer Huge queue + workers

Scaling Out Endpoints Round-robin clients

Load balancer

Backend nodes

Collector nodes

Aggregator nodes

Page 26: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Scaling Up Endpoints

• Pros: • Simple configuration

in collector nodes

• Cons: • Scaling up limit

Load balancer

Backend nodes

Page 27: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Scaling Out Endpoints

• Pros: • Unlimited scaling

by adding aggregator nodes

• Cons: • Complex configuration • Client features for round-robin

Page 28: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

WithoutDestination Aggregation

WithDestination Aggregation

Scaling UpEndpoints Systems in early stages

Collecting logs over Internet

or

Using queues

Scaling OutEndpoints

Impossible :(

Collector nodes must knowall endpoints

↓Uncontrollable

Collecting logsin datacenter

Page 29: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Case Studies

Page 30: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Case Study: Docker+Fluentd

• Destination aggregation + scaling up • Fluent logger + Fluentd

• Source aggregation + scaling up • Docker json logger + Fluentd + Elasticsearch • Docker fluentd logger + Fluentd + Kafka

• Source/Destination aggregation + scaling out • Docker fluentd logger + Fluentd

Page 31: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Why Fluentd?• Docker Fluentd logging driver

• Docker container can send logs into Fluentd directly - less overhead

• Pluggable architecture • Various destination systems

• Small memory footprint • Source aggregation requires +1 container per host • Less additional resource usage ( < 100MB )

Page 32: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Destination aggregation + scaling up

• Sending logs directly over TCP by Fluentd loggerin application code

• Same with patterns of New Relic

Application code

Page 33: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Source aggregation + scaling up

• Kubernetes: Json logger + Fluentd + Elasticsearch

• Applications write logs to STDOUT

• Docker writes logs as JSON in files

• Fluentd reads logs from file parse JSON objects writes logs to Elasticsearch

http://kubernetes.io/docs/getting-started-guides/logging-elasticsearch/

Elasticsearch

Application code

Files (JSON)

Page 34: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Source aggregation + scaling up

• Docker fluentd logging driver + Fluentd + Kafka

• Applications write logs to STDOUT

• Docker sends logs to localhost Fluentd

• Fluentd gets logs over TCP pushes logs into Kafka

Kafka

Application code

Page 35: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Source/Destination aggregation + scaling out

• Docker fluentd logging driver + Fluentd

• Applications write logs to STDOUT

• Docker sends logs to localhost Fluentd

• Fluentd gets logs over TCP sends logs into Aggregator Fluentd w/ round-robin load balance

Application code

Page 36: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

What's the Best?• Writing logs from containers: Some way to do it

• Docker logging driver • Write logs on files + read/parse it • Send logs from apps directly

• Keep it scalable! • Source aggregation: Fluentd on localhost • Scalable storage: (Kafka, external services, ...)

• No destination aggregation + Scaling up • Non-scalable storage: (Filesystems, RDBMSs, ...)

• Destination aggregation + Scaling out

Page 37: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Why OSS Are Important For Logging?

Page 38: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Why OSS?

• Logging layer is interface • transparency • interoperability

• Keep it scalable • number of nodes • number of types of source/destination

Page 39: Distributed Logging Architecture in Container Era · Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable

Use OSS, Make Logging Scalable