Distributed Stream Processing on Fluentd / #fluentd

Distributed message stream processing on Fluentd

2012/02/04 Fluentd meetup in Japan

NHN Japan Corp. Web Service Business DivisionTAGOMORI Satoshi (@tagomoris)

12年2月4日土曜日

Working at NHN Japanwe are hiring!


What we are doing about logs with fluentd

data mining

reportingpage views, unique users,

traffic amount per page,

...


super large scale

'sed | grep | wc'like processes

What we are doing about logs with fluentd


What fluentd? (not Storm, Kafka or Flume?)

Ruby, Ruby, Ruby! (NOT Java!)we are working in lightweight language culture

easy to try, easy to patch

Plugin model architecture

Builtin TimeSlicedOutput mechanism


What I talk today

What we are trying with fluentd

How we did, and how we are doing now

What is distributed stream process topologies like?

What is important about stream processing

Implementation details

(appendix)


Architecture in last week's presentationサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

send data both archive servers and Fluentd workers (as stream)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

aggregation querieson demand

archive server(scribed)

Large volume RAIDdeliver server

(scribed)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

convert logs as structured dataand write HDFS (as stream)

import past logs and converton demand (as batch)


Nowサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

archive server(scribed)

Large volume RAIDdeliver server

(Fluentd)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

Fluentd Watcher


Fluentd in production service

10 days


from 127 Web Servers

146 log streams

Scale of Fluentd processes



70,000 messages/sec

120 Mbps

(at peak time)


650 GB/day(non-blog: 100GB)



89 fluentd instances

on

12 nodes (4Core HT)



We can't go back.

crouton by kbysmnr12年2月4日土曜日

log conversion

from: raw log

(apache combined like format)

to: structured and query-friendly log

(TAB separated, masked some fields, many flags added)



log conversion


99.999.999.99 - - [03/Feb/2012:10:59:48 +0900] "GET /article/detail/6246245/ HTTP/1.1" 200 17509 "http://news.livedoor.com/topics/detail/6246245/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR

3.0.30729; Media Center PC 6.0; InfoPath.1; .NET4.0C)" "news.livedoor.com" "xxxxxxx.xx.xxxxxxx.xxx" "-" 163266

152930 news.livedoor.com /topics/detail/6242972/ GET 302 210 226 - 99.999.999.99 TQmljv9QtXkpNtCSuWVGGg Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X)

AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A406 Safari/7534.48.3 TRUE TRUE FALSE FALSE FALSE FALSE FALSE

hhmmdd vhost path method status bytes duration referer rhost userlabel agent FLAG [FLAGS]FLAGS: status_redirection status_errors rhost_internal suffix_miscfile suffix_imagefile agent_bot

FLAG: logical OR of FLAGSuserlabel: hash of (tracking cookie / terminal id (mobile phone) / rhost+agent)


TimeSlicedOutput of fluentd

Traditional 'log rotation' is important, but troublesome

We want:

2/3 23:59:59 log in access.0203_23.log

2/4 00:00:00 log in access.0204_00.log



How we did, and how we are doing now

collect

archive

convert

aggregate

show


How we did in past (2011)

collect (scribed)

archive (scribed)

convert (Hadoop Streaming)

aggregate (Hive)

show

streamstream

hourly/daily

on demand

on demand

HIGH LATENCYtime to flush +

hourly invocation +running time20-25mins

store to hdfs


How we are doing now

collect (Fluentd)

archive (scribed)

convert (Fluentd)

aggregate (Hive)

show

streamstream

on demand

on demand

stream convertVELY LOW LATENCY

2-3 minutes(only time to wait flush)

stream

store to hdfs(over Cloudera's Hoop)


break.crouton by kbysmnr


reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes



How to re-run conversion as batch when we got troubles?

We want to use 'just one' converter program for both stream processes

and batch processes!

Stream processing and batch


out_exec_filter (fluentd built-in plugin)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'


read from stdin / write to stdout

TAB separated values as input/output

WOW!!!!!!!

difference: 'tag' may be needed with out_exec_filter

simple solution: if not exists, ignore.

'out_exec_filter' and 'Hadoop Streaming'


reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes



What is distributed stream process toplogies like?

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Redundancy and load balancing MUST be guaranteed anywhere.


Deliver nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Accept connections from web servers,Copy messages and send to:

1. archiver (and its backup)2. convert workers (w/ load balancing)

3. and ...

useful for casual worker append/remove12年2月4日土曜日

Worker nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Under load balancing,workers as many as you want


Serializer nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Receive converted data stream from workers,aggregate by services, and :

1. write to storage(hfds/hoop)2. and...

useful to reduce overhead of storage from many concurrent write operations


Watcher nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

watcherwatcher

Watching data for real-time workload repotings

and trouble notifications

1. for raw data from delivers2. for structured data from serializers


break.crouton by kbysmnr


Implementation details

log agents on servers (scribeline)

deliver (copy, in_scribe, out_scribe, out_forward)

worker (in/out_forward, out_exec_filter)

serializer/hooper (in/out_forward, out_hoop)

watcher (in_forward, out_flowcounter, out_growthforecast)


log agent: scribeline

log delivery agent tool, python 2.4, scribe/thrift

easy to setup and start/stopworks with any httpd configuration updates

works with logrotate-ed log filesautomatic delivery target failover/takeback

(NEW) Cluster support(random select from server list)

https://github.com/tagomoris/scribe_line


https://github.com/facebook/scribe/wiki/Scribe-Configuration

https://github.com/facebook/scribe/wiki/Scribe-Configuration

From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

fluentd

in_scribe

in_scribe

scribe


From scribeline To deliverdeliver 01 (primary)

deliver 02 (secondary)

deliver 03 (primary for high throughput nodes)

x8 fluentdper node

xNN servers


From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

fluentd

in_scribe

in_scribe


deliver node internal routing

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true

time: received_attag: scribe.blog

message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)


deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)

out_forward server: worker01 port 24211 secondary server: worker02 port 24211





message: RAW LOG


From deliver To worker

deliver serverdeliver fluentd

copy scribe.*roundrobin

out_forward

worker server Xworker fluentd Xn1

in_forward


message: RAW LOG


message: RAW LOG

worker server Yworker fluentd Yn2

in_forward


worker node internal routing

worker server x8 worker instances, x1 serializer instanceworker fluentd

in_forwardserializer fluentd

out_forward converted.*

in_forwardout_exec_filter scribe.*command: convert.shin_keys: tag,messageremove_prefix scribeout_keys: .......add_prefix: convertedtime_key: timefieldtime_format: %Y%m%d%H%M%S

time:received_attag: scribe.blog

message: RAW LOG

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

TAB separatedtext data


out_exec_filter (review.)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'


worker fluentd

out_exec_filter behavior details

out_exec_filter scribe.*command: convert.sh in_keys: tag,message remove_prefix: scribe out_keys: ....... add_prefix: converted time_key: timefield time_format: %Y%m%d%H%M%S


message: RAW LOG

Forked Process (convert.sh -> perl convert.pl)stdin

blog RAWLOG

stdout

blog 20120204175035 field1 field2.....

time: 2012/02/04 17:50:35tag: converted.blog

path:... agent:...referer:... flag1:TRUE


From serializer To HDFS (Hoop)

worker serverserializer fluentd

in_forward

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

Hadoop NameNodeHoop Server

HDFS

TAB separatedtext data

HTTP


worker node cluster

deliver node clusterOverview

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer


にくきゅー。crouton by kbysmnr


Traffics: Bytes/sec (on deliver 2/3-4)

• bytes


Traffics: Messages/sec (on deliver 2/3-4)

• counts


Traffic/CPU/Load/Memory: deliver nodes (2/3-4)


Traffics: workers network traffics total

• total network traffics


Traffic/CPU/Load/Memory: a worker (2/3-4)


Fluentd stream processing

Finally, works fine, now

Log conversion latency dramatically reduced

Many useful plugins for monitoring are waiting shipped

Hundreds of cool features to implement are also waiting for us!


crouton by kbysmnr

Thank you!


Appendixcrouton by kbysmnr


input traffics: by fluent-plugin-flowcounter

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true


message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)


bytes/messages counting on fluentd

1. 'out_flowcounter' counts input message and its size (specified fields) and its rate (/sec)

2. Counting results emitted per minute/hour/day

3. Worker fluentd sends results to 'Watcher' node over out_forward

4. Watcher receives counting results, and pass to 'out_growthforecast'.

'GrowthForecast' is graph drawing tool with REST API for data registration, by kazeburo


out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?


deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)






message: RAW LOG


out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?


From worker To serializer: details

worker server x8 worker instances, x1 serializer instanceworker fluentd serializer fluentd

out_forward converted.*server: localhostsecondary: worker1, worker2, worker3, worker4 worker5, worker6, worker7

in_forward

normally send to localhost

in trouble, balance all traffic to all other worker's serializers


Software list:scribed: github.com/facebook/scribe/

scribeline: github.com/tagomoris/scribe_line

fluent-plugin-scribe: github.com/fluent/fluent-plugin-scribe

Hoop: http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

fluent-plugin-hoop: github.com/fluent/fluent-plugin-hoop

GrowthForecast: github.com/kazeburo/growthforecast

fluent-plugin-growthforecast: github.com/tagomoris/fluent-plugin-growthforecast

fluent-plugin-flowcounter: github.com/tagomoris/fluent-plugin-flowcounter


http://github.com/facebook/scribe/

http://github.com/facebook/scribe/



http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

https://github.com/fluent/fluent-plugin-hoop

https://github.com/fluent/fluent-plugin-hoop

https://github.com/kazeburo/growthforecast

https://github.com/kazeburo/growthforecast

https://github.com/tagomoris/fluent-plugin-growthforecast

https://github.com/tagomoris/fluent-plugin-growthforecast

Distributed Stream Processing on Fluentd / #fluentd

Technology

Transcript of Distributed Stream Processing on Fluentd / #fluentd