Singer, Pinterest's Logging Infrastructure
-
Upload
discover-pinterest -
Category
Technology
-
view
233 -
download
2
description
Transcript of Singer, Pinterest's Logging Infrastructure
![Page 1: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/1.jpg)
![Page 2: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/2.jpg)
Krishna GadeData Engineering Manager
Discover PinterestBig Data and Apache Mesos
![Page 3: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/3.jpg)
Connor Doyle
Mesosphere
Roger Wang
Bernardo Gomez Palacio
Guavus
![Page 4: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/4.jpg)
Pinterest is a data product.
![Page 5: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/5.jpg)
A/B Experimentation
Promoted Pins
Product Insights
Spam Control Related Pins
Home Feed
Search Quality
DATA
![Page 6: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/6.jpg)
Numbers
• > 30 billion Pins
• 10 billion messages-a-day logged to Kafka
• 10 petabytes of data in S3
• Ingest 20 terabytes of new data each day
• Petabyte-a-day processed in Hadoop
• 6 Hadoop clusters of 3000 nodes in AWS
• Over 100 regular users running over 2,000 jobs each day
![Page 7: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/7.jpg)
4x Data Growth
![Page 8: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/8.jpg)
Data Architecture Overview
pins
repins, likes
impressions
Kafka
App
Storm
HadoopSinger
HBase
Redshift
Insights
Features
![Page 9: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/9.jpg)
Roadmap
• Switch to Kafka 0.8 for all data streams
• Invest in scalable stream processing for realtime insights and products
• Migrate to a robust Hadoop 2.0 platform
• Experiment with Spark esp., for machine learning
• Unified batch and stream compute framework
![Page 10: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/10.jpg)
Roger WangSoftware Engineer
SingerA High-Performance Logging Infrastructure
![Page 11: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/11.jpg)
Logging Infrastructure before Singer
Storm
kafka agent
app
app
kafka agent
app
app
Host
app
app Kafka Consumer
S3Kafka copier
Kafka Cluster
Hadoop cluster
![Page 12: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/12.jpg)
Logging Infrastructure with SingerLogging infrastructure with Singer
Storm
kafka agent
app
app
kafka agent
app
app
Host
singer agent
app
appKafka
Consumer
S3Secor
Kafka Cluster
Hadoop cluster
![Page 13: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/13.jpg)
Singer Logging Agent
•Simple logging mechanism for applications• Decouple applications from log repository
• Existing applications that logs to disk
• Isolate applications from Singer agent failure
• Isolate applications from log repository failure• Avoid internal buffering and log loss
•Better resource usage• Connection consolidation
• Flexible batching
![Page 14: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/14.jpg)
Singer Features
•At-least-once delivery
•Configurable adaptive log latency by periodical tailing
•Dynamically discover new log streams
•Dynamically pick up new log configuration
•Pluggable log stream reader
•Pluggable log stream writer
•Rich set of stats via Ostrich
![Page 15: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/15.jpg)
Singer Architecture
LogStream monitor
Configuration watcher
Reader Writer
Log repository
Reader Writer
Reader Writer
Reader Writer
Log configuration
LogStream processorsA - 1
A -2
B - 1
C - 1
![Page 16: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/16.jpg)
Singer Concepts and Components
•LogStream/LogFile
•LogPosition
•LogStreamMonitor
•LogStreamProcessor
•LogStreamReader/LogFileReader
•LogStreamWriter
![Page 17: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/17.jpg)
Log Stream Monitor
LogStream monitor
Log Stream A-1 Processor Stats
Log Stream B-1 Processor Stats
Log Stream B-2
LogStream Registrar
empty log stream Processor Stats
Periodic Task
![Page 18: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/18.jpg)
Log Stream Processor
Reader
Writer
Commit position
Refresh LogStream
EOS
next batch
update statscalculate next processing timeschedule next processing cycle
Abort on exception
No Yes
Load position and seek reader
Abort on exception
Process batch
Abort on exception
Processing a batch
![Page 19: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/19.jpg)
Adaptive Log Processing Interval
No messagenext cycle =min(MaxInterval, 2*current interval)
> 1 messages
next cycle = MinInterval
[MinInterval, MaxInterval]
![Page 20: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/20.jpg)
Pluggable Log Stream Reader
LogFileReader LogMessage with LogPosition
LogMessage: {key: <binary>; timestamp: <timestamp>; message: <binary>}LogPosition: inode + byte offset
![Page 21: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/21.jpg)
Log Message
Envelope thrift message passed between Reader and Writer:
key binary Uninterpreted binary used to co-locate message. Examples are: session id so that all log entries in the session are on the same partition. No seder cost.
timestamp nanosecs
message binary Uninterpreted binary data. Examples are: Text log line, thrift message or file path. No serder cost.
![Page 22: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/22.jpg)
Log Position
● Caching can give wrong byte offset● Implement a generic buffered Java InputStream which tracks byte offsets● Restrictions: Reader should not cache or read-ahead.
LogFile inode next log file to read from
byteOffset byte offset from head of file next byte to read from the file
![Page 23: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/23.jpg)
Log Rotation
log log.1 log.2 log.4log.3 log.6log.5 log.7
log log.1 log.2 log.4log.3 log.6log.5 log.7
1. Using inode to identify log file.2. Check inode<->filename mapping when open file by name.
10 12 1413 1615 1711
12 1413 1615111018
![Page 24: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/24.jpg)
Duplicate inodes
log log.1 log.2 log.4log.3 log.6log.5 log.7
log log.1 log.2 log.4log.3 log.6log.5 log.7
10 12 1413 1615 1711
12 1413 1615111018
Skip the cycle to wait for log rotation.
![Page 25: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/25.jpg)
Log File Reader Caveats
Corrupted block Partial LogMessage
Log File Reader kept open between processing cycle to avoid file opening cost
![Page 26: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/26.jpg)
Pluggable Log Stream Writer
•Writer interprets LogMessage
•Examples:• Log archiver interpret the message as file path
• Kafka writer create Kafka message without deserialize the content in the envelope.
![Page 27: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/27.jpg)
Log Configuration
Puppet master
WatcherRestart Singer on change
puppet agent
![Page 28: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/28.jpg)
Singer Deployment
•Debian package: part of base image?
•Dynamic configuration update through Puppet
•Resource footprint enformed
•Rich stats exported through Ostrich to OpenTSD
![Page 29: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/29.jpg)
Alternatives
•Scribe
•Logstash
•…
![Page 30: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/30.jpg)
What’s next?
•Resilient file format so that we can skip corrupted blocks
•Pluggable log processing policy
![Page 31: Singer, Pinterest's Logging Infrastructure](https://reader034.fdocuments.us/reader034/viewer/2022051207/540d7dbf8d7f72767e8b49db/html5/thumbnails/31.jpg)