TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data...
Transcript of TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data...
![Page 1: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/1.jpg)
TSAR A TimeSeries AggregatoR
Anirudh Todi | Twitter | @anirudhtodi | TSAR
![Page 2: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/2.jpg)
What is TSAR?
![Page 3: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/3.jpg)
What is TSAR?
TSAR is a framework and service infrastructure for specifying, deploying and operating timeseries aggregation jobs.
![Page 4: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/4.jpg)
TimeSeries Aggregation at Twitter
![Page 5: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/5.jpg)
TimeSeries Aggregation at TwitterA common problem ⇢Data products (analytics.twitter.com, embedded
analytics, internal dashboards) ⇢Business metrics ⇢Site traffic, service health, and user engagement
monitoring !
Hard to do at scale ⇢10s of billions of events/day in real time
!
Hard to maintain aggregation services once deployed - ⇢Complex tooling is required
![Page 6: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/6.jpg)
TimeSeries Aggregation at Twitter
![Page 7: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/7.jpg)
TimeSeries Aggregation at TwitterMany time-series applications look similar ⇢ Common types of aggregations ⇢ Similar service stacks
!
Multi-year effort to build general solutions ⇢Summingbird - abstraction library for generalized
distributed computation !
TSAR - an end-to-end aggregation service built on Summingbird ⇢Abstracts away everything except application's data
model and business logic
![Page 8: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/8.jpg)
A typical aggregationEvent logs
Extract
Group
Measure
Store
![Page 9: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/9.jpg)
[ (“/“ , 300, iPhone,…) , (“/favorites”, 300, iPhone,…) , (“/replies” , 300, Anroid,…) , (“/“ , 200, Web,…) , (“/favorites”, 200, Web,…) , (“/“ , 200, iPhone,…) ]
![Page 10: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/10.jpg)
[ (“/“ , 300) , (“/favorites”, 300) , (“/replies” , 300) , (“/“ , 200) , (“/favorites”, 200) , (“/“ , 200) ]
![Page 11: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/11.jpg)
[ (“/“ , [ (“/“, 300) , (“/“, 200 ) , (“/“, 200 ) ] ) , (“/favorites” , [ (“/favorites”, 300) , (“/favorites”, 300) ] ) , (“/replies” , [ (“/replies”, 300) ] ) ]
![Page 12: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/12.jpg)
[ (“/“ , 3) , (“/favorites”, 2) , (“/replies” , 1) ]
![Page 13: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/13.jpg)
key-value
Vertica
![Page 14: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/14.jpg)
Example: API aggregates
![Page 15: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/15.jpg)
Example: API aggregates!
⇢Bucket each API call !
⇢Dimensions - endpoint, datacenter, client application ID !
⇢Compute - total event count, unique users, mean response time etc !
⇢Write the output to Vertica
![Page 16: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/16.jpg)
Example: Impressions by Tweet
![Page 17: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/17.jpg)
Example: Impressions by Tweet
!
⇢Bucket each impressions by tweet ID !
⇢Compute total count, unique users !
⇢Write output to a key-value store !
⇢Expose output via a high-SLA query service !
⇢Write sample of data to Vertica for cross-validation
![Page 18: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/18.jpg)
Problems
![Page 19: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/19.jpg)
Problems!
⇢Service interruption: Can we retrieve lost data? !
⇢Data schema coordination: Store output as log data, in a key-value data store, in cache, and in relational databases !
⇢Flexible schema change !
⇢Easy to backfill and update/repair historical data !
Most important: Solve these problems in a general way
![Page 20: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/20.jpg)
TSAR’s design principles
![Page 21: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/21.jpg)
TSAR’s design principles
1) Hybrid computation: Build on Summingbird, process each event twice - in real time & in batch (at a later time) !
⇢Gives stability and reproducibility of batch ⇢Streaming (recency) of realtime
!
Leverage the Summingbird ecosystem: ⇢Abstraction framework over computing platforms ⇢Rich library of approximation monoids (Algebird) ⇢Storage abstractions (Storehaus)
!
!
![Page 22: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/22.jpg)
![Page 23: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/23.jpg)
TSAR’s design principles
![Page 24: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/24.jpg)
TSAR’s design principles
2) Separate event production from event aggregation !
User specifies how to extract events from source data !
Bucketing and aggregating events is managed by TSAR !
!
![Page 25: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/25.jpg)
TSAR’s design principles
![Page 26: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/26.jpg)
TSAR’s design principles
3) Unified data schema: ⇢Data schema specified in datastore-independent way ⇢Managed schema evolution & data transformation
!
Store data on: ⇢HDFS ⇢Manhattan (key-value) ⇢Vertica/MySQL ⇢Cache
!
Easily extensible to other schemas (Cassandra, HBase, etc)
![Page 27: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/27.jpg)
Data Schema Coordination
Event Log
Kafka Spout
Intermediate Aggregates
Manhattan
Vertica
• Schema consistent across stores • Update stores when schema evolves
![Page 28: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/28.jpg)
TSAR’s design principles
![Page 29: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/29.jpg)
TSAR’s design principles
4) Integrated service toolkit !
⇢One-stop deployment tooling !
⇢Data warehousing !
⇢Query capability !
⇢Automatic observability and alerting !
⇢Automatic data integrity checks
![Page 30: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/30.jpg)
Tweet Impressions in TSAR
![Page 31: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/31.jpg)
Tweet Impressions in TSAR⇢Annotate each tweet with an impression count
!
⇢Count = unique users who saw that tweet !
⇢Massive scalability challenge: • > 500MM tweets/day • tens of billions of impressions
!
⇢Want realtime updates !
⇢Production ready and robust
![Page 32: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/32.jpg)
A minimal Tsar project
Scala Tsar job Configuration File
Thrift IDL
![Page 33: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/33.jpg)
struct TweetAttributes { 1: optional i64 tweet_id }
Thrift IDL
![Page 34: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/34.jpg)
Tweet Impressions ExampleScala Tsar job
![Page 35: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/35.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } }
Scala Tsar job
![Page 36: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/36.jpg)
Tweet Impressions ExampleScala Tsar job
![Page 37: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/37.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } }
Scala Tsar job
![Page 38: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/38.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } }
Scala Tsar job
![Page 39: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/39.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } }
Dimensions for job aggregation
Scala Tsar job
![Page 40: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/40.jpg)
Tweet Impressions ExampleScala Tsar job
![Page 41: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/41.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 42: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/42.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 43: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/43.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Metrics to compute
Scala Tsar job
![Page 44: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/44.jpg)
Tweet Impressions ExampleScala Tsar job
![Page 45: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/45.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 46: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/46.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
What datastores to write to
Scala Tsar job
![Page 47: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/47.jpg)
Tweet Impressions ExampleScala Tsar job
![Page 48: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/48.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 49: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/49.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 50: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/50.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Summingbird fragment to describe event production.
Scala Tsar job
![Page 51: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/51.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Summingbird fragment to describe event production.
Scala Tsar job
![Page 52: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/52.jpg)
Tweet Impressions Exampleaggregate { onKeys( (TweetId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.tweetId) (event.timestamp, impr) } } !
Summingbird fragment to describe event production.
There is no aggregation logic specified here
Scala Tsar job
![Page 53: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/53.jpg)
Config( base = Base( namespace = 'tsar-examples', name = 'tweets', user = 'tsar-shared', thriftAttributesName = 'TweetAttributes', origin = '2014-05-15 00:00:00 UTC', ! jobclass = 'com.twitter.platform.analytics.examples.TweetJob', ! outputs = [ Output(sink = Sink.IntermediateThrift, width = 1 * Day), Output(sink = Sink.Manhattan1, width = 1 * Day) ], ...
Configuration File
![Page 54: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/54.jpg)
What has been specified?
![Page 55: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/55.jpg)
What has been specified?⇢Our event schema (in thrift)
!
⇢How to produce these events !
⇢Dimensions to aggregate on !
⇢Time granularities to aggregate on !
⇢Sinks (Manhattan / MySQL) to use !
![Page 56: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/56.jpg)
What do you not specify?
⇢How to represent the aggregated data !
⇢How to represent the schema in MySQL / Manhattan !
⇢How to perform the aggregation !
⇢How to locate and connect to underlying services (Hadoop, Storm, MySQL, Manhattan, …)
!
![Page 57: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/57.jpg)
Operational simplicity
End-to-end service infrastructure with a single command
$ tsar deploy --env=prod
⇢Launch Hadoop jobs ⇢Launch Storm jobs ⇢Launch Thrift query service ⇢Launch loader processes to load data into MySQL / Manhattan ⇢Mesos configs for all of the above ⇢Alerts for the batch & storm jobs and the query service ⇢Observability for the query service ⇢Auto-create tables and views in MySQL or Vertica ⇢Automatic data regression and data anomaly checks
![Page 58: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/58.jpg)
Bird’s eye view of the TSAR pipeline
![Page 59: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/59.jpg)
Seamless Schema evolutionScala Tsar job
![Page 60: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/60.jpg)
Seamless Schema evolutionBreak down impressions by the client application (Twitter for iPhone, Twitter for Android etc) !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 61: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/61.jpg)
Seamless Schema evolutionBreak down impressions by the client application (Twitter for iPhone, Twitter for Android etc) !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } !
Scala Tsar job
![Page 62: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/62.jpg)
Seamless Schema evolutionBreak down impressions by the client application (Twitter for iPhone, Twitter for Android etc) !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes(event.client, event.tweetId) (event.timestamp, impr) } } !
New aggregation dimension
Scala Tsar job
![Page 63: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/63.jpg)
Backfill tooling
![Page 64: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/64.jpg)
Backfill tooling
But what about historical data?
![Page 65: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/65.jpg)
Backfill tooling
But what about historical data?
tsar backfill —start=<start> —end=<end>
![Page 66: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/66.jpg)
Backfill tooling
But what about historical data?
tsar backfill —start=<start> —end=<end>
Backfill runs parallel to the production job !
Useful for repairing historical data as well
![Page 67: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/67.jpg)
Aggregating on different granularitiesConfiguration File
![Page 68: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/68.jpg)
Aggregating on different granularities
We have been computing only daily aggregates We now wish to add alltime aggregates
Configuration File
![Page 69: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/69.jpg)
Aggregating on different granularities
We have been computing only daily aggregates We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime)
Configuration File
![Page 70: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/70.jpg)
Aggregating on different granularities
We have been computing only daily aggregates We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime)
Configuration File
![Page 71: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/71.jpg)
Aggregating on different granularities
We have been computing only daily aggregates We now wish to add alltime aggregates
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime)
New aggregation granularity
Configuration File
![Page 72: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/72.jpg)
Automatic metric computationScala Tsar job
![Page 73: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/73.jpg)
Automatic metric computationSo far, only total view counts. Now, add # unique users viewing each tweet !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } }
Scala Tsar job
![Page 74: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/74.jpg)
Automatic metric computationSo far, only total view counts. Now, add # unique users viewing each tweet !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } }
Scala Tsar job
![Page 75: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/75.jpg)
Automatic metric computationSo far, only total view counts. Now, add # unique users viewing each tweet !aggregate { onKeys( (TweetId), (TweetId, ClientApplicationId) ) produce ( Count, Unique(UserId) ) sinkTo (Manhattan) } fromProducer { ClientEventSource(“client_events”) .filter { event => isImpressionEvent(event) } .map { event => val impr = ImpressionAttributes( event.client, event.userId, event.tweetId ) (event.timestamp, impr) } }
New metric
Scala Tsar job
![Page 76: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/76.jpg)
Support for multiple sinksConfiguration File
![Page 77: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/77.jpg)
Support for multiple sinksSo far, only persisting data to Manhattan !
Persist data to MySQL as well
Configuration File
![Page 78: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/78.jpg)
Support for multiple sinksSo far, only persisting data to Manhattan !
Persist data to MySQL as well
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime)
Configuration File
![Page 79: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/79.jpg)
Support for multiple sinksSo far, only persisting data to Manhattan !
Persist data to MySQL as well
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime)
Configuration File
![Page 80: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/80.jpg)
Support for multiple sinksSo far, only persisting data to Manhattan !
Persist data to MySQL as well
Output(sink = Sink.Manhattan, width = 1 * Day) Output(sink = Sink.Manhattan, width = Alltime) Output(sink = Sink.MySQL, width = Alltime)
New sink
Configuration File
![Page 81: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/81.jpg)
Tsar WorkflowCreate Job
Deploy Job
Modify Job
Optional: Backfill Job
![Page 82: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/82.jpg)
TSAR Optimizations
![Page 83: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/83.jpg)
Cache TSAR PipelineLogs
Intermediate Thrift
Sink
Total Thrift
Cache filtered events
Cache aggregation results
SinkSink aggregate { onKeys ( (TweetIdField), // Total (ClientApplicationIdField), // Total (TweetIdField, ClientApplicationIdField) // Intermediate ) produce ( ... ) sinkTo ( ... )
![Page 84: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/84.jpg)
Covering Template aggregate { ! onKeys ( (TweetIdField), (ClientApplicationIdField), (TweetIdField, ClientApplicationIdField) // Covering Template ) produce ( ... ) sinkTo ( ... )
![Page 85: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/85.jpg)
Intermediate SinksConfig( base = Base( namespace = 'tsar-examples', name = 'tweets', user = 'tsar-shared', thriftAttributesName = 'TweetAttributes', origin = '2014-05-15 00:00:00 UTC', jobclass = 'com.twitter.platform.analytics.examples.TweetJob', outputs = [ Output(sink = Sink.IntermediateThrift, width = 1 * Day), Output(sink = Sink.TotalThrift, width = 1 * Day), Output(sink = Sink.Manhattan1, width = 1 * Day), Output(sink = Sink.Vertica, width = 1 * Day) ], …
![Page 86: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/86.jpg)
Trade space for timeLogs
Sink
Logs
Intermediate Thrift
Sink
Total Thrift
Logs
Intermediate Thrift
Sink SinkSinkSinkSinkSinkSink
![Page 87: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/87.jpg)
Manhattan - Key ClusteringReduces Manhattan queries for large time ranges
![Page 88: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/88.jpg)
Manhattan - Key Clustering2014-01-01 12:00:00
2014-01-01 13:00:00
2014-01-02 12:00:00
2014-01-02 13:00:00
2014-01-03 12:00:00
2014-01-03 12:00:00
2014-01-01
!
2014-01-02
!
2014-01-03
!
1 Day Clustering
![Page 89: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/89.jpg)
Manhattan - Key Clustering2014-01-01 12:00:00
2014-01-01 13:00:00
2014-01-02 12:00:00
2014-01-02 13:00:00
2014-01-03 12:00:00
2014-01-03 12:00:00
!
!
2014-01-01 to 2014-01-07
!
!
!
7 Day Clustering
![Page 90: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/90.jpg)
Value Packing - Key IndexerSome keys are always queried together
![Page 91: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/91.jpg)
Value Packing - Key Indexer!Naive approach:
! (CampaignIdField, CountryField) —-> Impressions !Would have to query: ! (CampaignIdField, USA) —-> Impressions_USA (CampaignIdField, UK) —-> Impressions_UK … !Better approach: ! (CampaignIdField) —> Map[CountryField -> Impressions] !Would have to query: ! (CampaignIdField) —> Map[USA -> Impressions_USA, UK -> Impressions_UK,…]
⇢Limits key fanout !
⇢Implicit index on CountryField - don’t need to know which countries to query for
!
![Page 92: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/92.jpg)
TSAR Visualization
![Page 93: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/93.jpg)
TSAR Visualization
![Page 94: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/94.jpg)
Conclusion: Three basic problems
![Page 95: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/95.jpg)
Conclusion: Three basic problems⇢Computation management
Describe and execute computational logic Specify aggregation dimensions, metrics and time granularities
⇢Dataset management Define, deploy and evolve data schemas Coordinate data migration, backfill and recovery
⇢Service management Define query services, observability, alerting, regression checks, coordinate deployment across all underlying services
TSAR gives you all of the above !
![Page 96: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/96.jpg)
Key Takeaway
![Page 97: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/97.jpg)
Key Takeaway
“The end-to-end management of the data pipeline is TSAR’s key feature. The user concentrates on the business logic.” !
![Page 98: TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business](https://reader033.fdocuments.us/reader033/viewer/2022050421/5f90bf3f18c1033d857d1c08/html5/thumbnails/98.jpg)
!
Thank you! !
Questions?@anirudhtodi ani @ twitter