Fluentd and Docker - running fluentd within a docker container
Fluentd meetup #3
-
Upload
treasure-data-inc -
Category
Technology
-
view
2.390 -
download
1
description
Transcript of Fluentd meetup #3
Sadayuki FuruhashiTreasuare Data, Inc.Founder & Software Architect
Collecting app metricsin decentralized systemsDecision making based on facts
Fluentd meetup #3
Self-introduction
> Sadayuki Furuhashi
> Treasure Data, Inc.Founder & Software Architect
> Open source projectsMessagePack - efficient serializer (original author)
Fluentd - event collector (original author)
∈
What’s our service?
What’s the problems we faced?
How did we solve them?
What did we learn?
We open sourced the system
My Talk
What’s Treasure Data?
Treasure Data provides cloud-based data warehouse as a service.
Apache
App
App
Other data sources
td-agent RDBMS
Treasure Data columnar data
warehouse
Query Processing Cluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
User
td-command
BI apps
Treasure Data Service Architecture
open sourced
writes logs to text files
Rails app
GoogleSpreadsheet
MySQL
MySQL
MySQL
MySQL
writes logs to text files
NightlyINSERT
hundreds of app servers
Daily/HourlyBatch
KPIvisualizationFeedback rankings
Rails app
writes logs to text files
Rails app
- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latency
Example Use Case – MySQL to TD
hundreds of app servers
sends event logs
sends event logs
sends event logs
Rails app td-agent
td-agent
td-agent
GoogleSpreadsheet
Treasure Data
MySQL
Logs are availableafter several mins.
Daily/HourlyBatch
KPIvisualizationFeedback rankings
Rails app
Rails app
✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact
Example Use Case – MySQL to TD
What’s Treasure Data?
Key differentiators:> TD delivers BigData analytics> in days, not months> without specialists or IT resources> for 1/10th the cost of the alternatives
Why? Because it’s a multi-tenant service.
Problem 1:investigating problems took time
Customers need support...
> “I uploaded data but can’t get on queries”
> “Download query results take time”
> “Our queries take longer time recently”
Problem 1:investigating problems took time
Investigating these problems took timebecause:
doubts.count.times { servers.count.times { ssh to a server grep logs }}
* the actual facts
> Actually data were not uploaded(clients had a problem; disk full)
We had ought to monitor uploading so that we immediately know we’re not getting data from the user.
> Our servers were getting slower because of increasing load
We had ought to notice it and add servers before having the problem.
> There was a bug which occurs under a specific condition
We had ought to collect unexpected errors and fix it as soon as possible so that both we and users save time.
Problem 2:many tasks to do but hard to prioritizeWe want to do...
> fix bugs> improve performance> increase number of sign-ups> increase number of queries by customers> incrasse number of periodic queries
What’s the “bottleneck”, whch should be solved first?
data: Performance is getting worse.decision: Let’s add servers.
data: Many customers upload data but few customers issue queries.decision: Let’s improve documents.
data: A customer stopped to run upload data.decision: They might got a problem at the client side.
Problem 2:many tasks to do but hard to prioritize
We need data to make decision.
How did we solve?
We collected application metrics.
Treasure Data’s backend architecture
FrontendJob Queue
WorkerHadoop
Hadoop
Solution v1:
FrontendJob Queue
WorkerHadoop
Hadoop
Fluentd Fluentd pulls metrics every minuts(in_exec plugin)
Librato Metricsfor realtime analysis
Treasure Datafor historical analysis
What’s solved
We can monitor overal behavior of servers.
We can notice performance decreasing.
We can get alerts when a problem occurs.
What’s not solved
We can’t get detailed information.> how large data is “this user” uploading?
Configuration file is complicated.> we need to add lines to declare new metrics
Monitoring server is SPOF.
Solution v2:
FrontendJob Queue
WorkerHadoop
Hadoop
Fluentd
Applications push metrics to Fluentd(via local Fluentd)
Librato Metricsfor realtime analysis
Treasure Datafor historical analysis
Fluentd sums up data minuts(partial aggregation)
What’s solved by v2
We can get detailed information directly from applications
> graphs for each customers
DRY - we can keep configuration files simple> Just add one line to apps> No needs to update fluentd.conf
Decentralized streaming aggregation> partial aggregation on fluentd,
total aggregation on Librato Metrics
API
MetricSense.value {:size=>32}
MetricSense.segment {:account=>1}
MetricSense.fact {:path=>‘/path1’}
MetricSense.measure!
What did we learn?
> We always have lots of tasks> we need data to prioritize them.
> Problems are usually complicated> we need data to save time.
> Adding metrics should be DRY> otherwise you feel bored and will not add metrics.
> Realtime analysis is useful,but we still need batch analysis.> “who are not issuing queries, despite of storing data last month?”> “which pages did users look before sign-up?”> “which pages did not users look before getting trouble?”
We open sourced
MetricSensehttps://github.com/treasure-data/metricsense
Components of MetricSense
metricsense.gem> client library for Ruby to send metrics
fluent-plugin-metricsense> plugin for Fluentd to collect metrics> pluggable backends:
> Librato Metrics backend> RDBMS backend
RDB backend for MetricSense
Aggregate metrics on RDBMS in optimized form for time-series data.
> Borrowed concepts from OpenTSDB and OLAP cube.
base_time, metric_id, segment_id, m0, m1, m2, ..., m59 19:00 1 5 25 31 19 ... 21 21:00 2 5 75 94 68 ... 72 21:00 2 6 63 82 55 ... 63
metric_id, metric_name, segment_name 1 “import.size” NULL 2 “import.size” “account”
segment_id, name 5 “a001” 6 “a002”
metric_tags: segment_values:
data:
Solution v3 (future work):
Alerting using historical data> simple machine largning to adjust threashold
values
Historical averageAlert!
We’re Hiring!
Sales EngineerEvangelize TD/Fluentd. Get everyone excited!
Help customers deploy and maintain TD successfully.
Preferred experience: OS, DB, BI, statistics and data science
Devops engineerDevelopment, operation and monitoring of our large-scale, multi-tenant system
Preferred experience: large-scale system development and management
Competitive salary + equity package
Who we wantSTRONG business and customer support DNA
Everyone is equally responsible for customer supportCustomer success = our success
Self-discipline and responsibleBe your own manager
Team player with excellent communication skillsDistributed team and global customer base
Contact me: [email protected]