Fluentd meetup #3

33
Sadayuki Furuhashi Treasuare Data, Inc. Founder & Software Architect Collecting app metrics in decentralized systems Decision making based on facts Fluentd meetup #3

description

Fluentd meetup #3 #fluentd

Transcript of Fluentd meetup #3

Page 1: Fluentd meetup #3

Sadayuki FuruhashiTreasuare Data, Inc.Founder & Software Architect

Collecting app metricsin decentralized systemsDecision making based on facts

Fluentd meetup #3

Page 2: Fluentd meetup #3

Self-introduction

> Sadayuki Furuhashi

> Treasure Data, Inc.Founder & Software Architect

> Open source projectsMessagePack - efficient serializer (original author)

Fluentd - event collector (original author)

Page 3: Fluentd meetup #3

What’s our service?

What’s the problems we faced?

How did we solve them?

What did we learn?

We open sourced the system

My Talk

Page 4: Fluentd meetup #3

What’s Treasure Data?

Treasure Data provides cloud-based data warehouse as a service.

Page 5: Fluentd meetup #3

Apache

App

App

Other data sources

td-agent RDBMS

Treasure Data columnar data

warehouse

Query Processing Cluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

Treasure Data Service Architecture

open sourced

Page 6: Fluentd meetup #3

writes logs to text files

Rails app

GoogleSpreadsheet

MySQL

MySQL

MySQL

MySQL

writes logs to text files

NightlyINSERT

hundreds of app servers

Daily/HourlyBatch

KPIvisualizationFeedback rankings

Rails app

writes logs to text files

Rails app

- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latency

Example Use Case – MySQL to TD

Page 7: Fluentd meetup #3

hundreds of app servers

sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are availableafter several mins.

Daily/HourlyBatch

KPIvisualizationFeedback rankings

Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Example Use Case – MySQL to TD

Page 8: Fluentd meetup #3

What’s Treasure Data?

Key differentiators:> TD delivers BigData analytics> in days, not months> without specialists or IT resources> for 1/10th the cost of the alternatives

Why? Because it’s a multi-tenant service.

Page 9: Fluentd meetup #3

Problem 1:investigating problems took time

Customers need support...

> “I uploaded data but can’t get on queries”

> “Download query results take time”

> “Our queries take longer time recently”

Page 10: Fluentd meetup #3

Problem 1:investigating problems took time

Investigating these problems took timebecause:

doubts.count.times { servers.count.times { ssh to a server grep logs }}

Page 11: Fluentd meetup #3

* the actual facts

> Actually data were not uploaded(clients had a problem; disk full)

We had ought to monitor uploading so that we immediately know we’re not getting data from the user.

> Our servers were getting slower because of increasing load

We had ought to notice it and add servers before having the problem.

> There was a bug which occurs under a specific condition

We had ought to collect unexpected errors and fix it as soon as possible so that both we and users save time.

Page 12: Fluentd meetup #3

Problem 2:many tasks to do but hard to prioritizeWe want to do...

> fix bugs> improve performance> increase number of sign-ups> increase number of queries by customers> incrasse number of periodic queries

What’s the “bottleneck”, whch should be solved first?

Page 13: Fluentd meetup #3

data: Performance is getting worse.decision: Let’s add servers.

data: Many customers upload data but few customers issue queries.decision: Let’s improve documents.

data: A customer stopped to run upload data.decision: They might got a problem at the client side.

Problem 2:many tasks to do but hard to prioritize

We need data to make decision.

Page 14: Fluentd meetup #3

How did we solve?

We collected application metrics.

Page 15: Fluentd meetup #3

Treasure Data’s backend architecture

FrontendJob Queue

WorkerHadoop

Hadoop

Page 16: Fluentd meetup #3

Solution v1:

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd Fluentd pulls metrics every minuts(in_exec plugin)

Librato Metricsfor realtime analysis

Treasure Datafor historical analysis

Page 17: Fluentd meetup #3
Page 18: Fluentd meetup #3

What’s solved

We can monitor overal behavior of servers.

We can notice performance decreasing.

We can get alerts when a problem occurs.

Page 19: Fluentd meetup #3

What’s not solved

We can’t get detailed information.> how large data is “this user” uploading?

Configuration file is complicated.> we need to add lines to declare new metrics

Monitoring server is SPOF.

Page 20: Fluentd meetup #3

Solution v2:

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd

Applications push metrics to Fluentd(via local Fluentd)

Librato Metricsfor realtime analysis

Treasure Datafor historical analysis

Fluentd sums up data minuts(partial aggregation)

Page 21: Fluentd meetup #3

What’s solved by v2

We can get detailed information directly from applications

> graphs for each customers

DRY - we can keep configuration files simple> Just add one line to apps> No needs to update fluentd.conf

Decentralized streaming aggregation> partial aggregation on fluentd,

total aggregation on Librato Metrics

Page 22: Fluentd meetup #3
Page 23: Fluentd meetup #3

API

MetricSense.value {:size=>32}

MetricSense.segment {:account=>1}

MetricSense.fact {:path=>‘/path1’}

MetricSense.measure!

Page 24: Fluentd meetup #3

What did we learn?

> We always have lots of tasks> we need data to prioritize them.

> Problems are usually complicated> we need data to save time.

> Adding metrics should be DRY> otherwise you feel bored and will not add metrics.

> Realtime analysis is useful,but we still need batch analysis.> “who are not issuing queries, despite of storing data last month?”> “which pages did users look before sign-up?”> “which pages did not users look before getting trouble?”

Page 25: Fluentd meetup #3

We open sourced

MetricSensehttps://github.com/treasure-data/metricsense

Page 26: Fluentd meetup #3

Components of MetricSense

metricsense.gem> client library for Ruby to send metrics

fluent-plugin-metricsense> plugin for Fluentd to collect metrics> pluggable backends:

> Librato Metrics backend> RDBMS backend

Page 27: Fluentd meetup #3

RDB backend for MetricSense

Aggregate metrics on RDBMS in optimized form for time-series data.

> Borrowed concepts from OpenTSDB and OLAP cube.

base_time, metric_id, segment_id, m0, m1, m2, ..., m59 19:00 1 5 25 31 19 ... 21 21:00 2 5 75 94 68 ... 72 21:00 2 6 63 82 55 ... 63

metric_id, metric_name, segment_name 1 “import.size” NULL 2 “import.size” “account”

segment_id, name 5 “a001” 6 “a002”

metric_tags: segment_values:

data:

Page 28: Fluentd meetup #3

Solution v3 (future work):

Alerting using historical data> simple machine largning to adjust threashold

values

Historical averageAlert!

Page 29: Fluentd meetup #3
Page 30: Fluentd meetup #3

We’re Hiring!

Page 31: Fluentd meetup #3

Sales EngineerEvangelize TD/Fluentd. Get everyone excited!

Help customers deploy and maintain TD successfully.

Preferred experience: OS, DB, BI, statistics and data science

Devops engineerDevelopment, operation and monitoring of our large-scale, multi-tenant system

Preferred experience: large-scale system development and management

Page 32: Fluentd meetup #3

Competitive salary + equity package

Who we wantSTRONG business and customer support DNA

Everyone is equally responsible for customer supportCustomer success = our success

Self-discipline and responsibleBe your own manager

Team player with excellent communication skillsDistributed team and global customer base

Contact me: [email protected]