How we built a 1T/day stream processing cloud platform in...
Transcript of How we built a 1T/day stream processing cloud platform in...
![Page 1: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/1.jpg)
A NETFLIX ORIGINAL SERVICE
How we built a 1T/day stream processing cloud platform in a year
![Page 2: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/2.jpg)
Peter Bakas | @peter_bakas
@ Netflix : Cloud Platform Engineering - Real Time Data Infrastructure@ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure@ Yahoo : Display Advertising, Behavioral Targeting, Payments@ PayPal : Site Engineering and Architecture@ Play : Advisor to Startups (Data, Security, Containers)
Who is this guy?
![Page 3: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/3.jpg)
• Architectural design and principles for Keystone
• Technologies that Keystone is leveraging
• Best practices
What should I expect?
![Page 4: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/4.jpg)
Let’s get down to business
![Page 5: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/5.jpg)
Netflix is a data driven company
Content
Product
Marketing
Finance
Business Development
Talent
Infrastructure←
Cul
ture
of A
naly
tics
→
![Page 6: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/6.jpg)
Netflix Culture - Freedom & Responsibility
‘It may well be the most important document ever to come out of the Valley’
Sheryl Sandberg, COO @ Facebook
![Page 7: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/7.jpg)
What does Freedom & Responsibility have to do with it?
![Page 8: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/8.jpg)
700 billion unique events ingested / day
1 trillion unique events / day at peak of last holiday season
1+ trillion events processed every day
11 million events ingested / sec @ peak
24 GB / sec @ peak
1.3 Petabyte / day
By the numbers
![Page 9: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/9.jpg)
How did we get here?
![Page 10: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/10.jpg)
Chukwa
![Page 11: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/11.jpg)
Chukwa/Suro + Real-Time branch
![Page 12: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/12.jpg)
Stream Consumers
Samza Router
EMR
Fronting Kafka
Event Producer
Consumer Kafka
Control PlaneHTTP
PROXY
Keystone
![Page 13: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/13.jpg)
• Best effort delivery• Prefer msg drop than disrupting producer app
• 99.99% delivery SLA• Wraps Apache Kafka Producer• Integration with Netflix ecosystem: Eureka, Atlas, etc.
Netflix Kafka Producer
![Page 14: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/14.jpg)
• Using Netflix Platform logging API
• LogManager.logEvent(Annotatable): majority of the cases
• KeyValueSeriazlier with ILog#log(String)
• REST endpoint that proxies Platform logging
• ksproxy
• Prana sidecar
Producing Events to Keystone
![Page 15: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/15.jpg)
• GUID
• Timestamp
• Host
• App
Injected Event Metadata
![Page 16: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/16.jpg)
• Invisible to source & sinks
• Backwards and forwards compatibility
• Supports JSON. AVRO on the horizon
• Efficient - 10 bytes overhead per message
• Message size - hundreds of bytes to 10MB
Keystone Extensible Wire Protocol
![Page 17: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/17.jpg)
Fronting Kafka Clusters
![Page 18: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/18.jpg)
Stream Consumers
Samza Router
EMR
Fronting Kafka
Event Producer
Consumer Kafka
Control Plane
Keystone
![Page 19: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/19.jpg)
• Normal-priority (majority)
• High-priority (streaming activities etc.)
Fronting Kafka Clusters
![Page 20: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/20.jpg)
• 3 ASGs per cluster, 1 ASG per zone
• 3000+ AWS instances across 3 regions for regular & failover traffic
Fronting Kafka Clusters
![Page 21: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/21.jpg)
Deployment ConfigurationFronting Kafka Clusters Consumer Kafka Clusters
Number of clusters 24 12
Total number of instances 3,000+ 900+
Instance type d2.xl i2.2xl
Replication factor 2 2
Retention period 8 to 24 hours 2 to 4 hours
![Page 22: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/22.jpg)
• All replica assignments zone aware
• Improved availability
• Reduce cost of maintenance
Partition Assignments
![Page 23: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/23.jpg)
• By using the d2-xl there is trade off between cost and performance
• Performance deteriorates with increase of partitions
• Replication lag during peak traffic
Current Challenges
![Page 24: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/24.jpg)
Routing Service
![Page 25: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/25.jpg)
Stream Consumers
Samza Router
EMR
Fronting Kafka
Event Producer
Consumer Kafka
Control Plane
Keystone
![Page 26: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/26.jpg)
+
Checkpointing Cluster
+ 0.9.1
Routing Infrastructure
![Page 27: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/27.jpg)
Router Job Manager
(Control Plane)
EC2 InstancesZookeeper
(Instance Id assignment)
JobJob
Job
ksnode
Checkpointing Cluster
ASG
Reconcile every min.
![Page 28: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/28.jpg)
• Total of 13,000 containers on 1,300 AWS C3-4XL instances
• S3 sink: ~7000 Containers
• Consumer Kafka sink: ~ 4500 Containers
• ElasticSearch sink: ~1500 Containers
Routing Infrastructure
![Page 29: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/29.jpg)
• One Job per sink and Kafka source topic
• Separate Job each for S3, ElasticSearch & Kafka sink
• Provides better isolation & better QOS
• Batch processed message requests to sinks
• Offset checkpointed after batch request succeeds
Routing Job Details
![Page 30: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/30.jpg)
Processing SemanticsData Loss & Duplicates
![Page 31: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/31.jpg)
Producer ⇐ Kafka Cluster ⇐ Samza job router ⇐ Sink
Keystone - at least once
Backpressure
![Page 32: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/32.jpg)
• buffer full
• network error
• partition leader change
• partition migration
Data Loss - Producer
![Page 33: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/33.jpg)
• Lose all Kafka replicas of data
• Safe guards:
• AZ isolation / Alerts / Broker replacement automation
• Alerts and monitoring
• Unclean partition leader election
• ack = 1 could cause loss
Data Loss - Kafka
![Page 34: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/34.jpg)
• Lose checkpointed offset & the router was down for retention period duration
• Messages not processed past retention period (8h / 24h)
• Unclean leader election cause offset to go back
• Safe guard:
• Alerts for lag > 0.1% of traffic for 10 minutes
Data Loss - Router
![Page 35: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/35.jpg)
• Messages reprocessed - retry after batch S3 upload failure
• Loss of checkpointed offset (message processed marker)
• Event GUID helps dedup
Data Duplicates - Router
![Page 36: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/36.jpg)
![Page 37: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/37.jpg)
Producer to Router to Sink Average Latencies
• Batch processing [S3 sink]: ~1 sec
• Stream processing [Consumer Kafka sink]: ~800 ms
• Log analysis [ElasticSearch]: ~13 seconds (with back pressure)
End to End Metrics
![Page 38: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/38.jpg)
• Broker monitoring (Are you there?)
• Heart-beating & continuous message latency monitoring (Are you healthy?)
• Consumer partition count and offset monitoring (Are you delivering?)
Keystone Monitoring Service
![Page 39: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/39.jpg)
But wait, there’s more
![Page 40: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/40.jpg)
A True Story
![Page 41: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/41.jpg)
Keystone went live 10.27.20152 days later...
![Page 42: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/42.jpg)
Multiple ZooKeeper servers became unhealthy
ZooKeeper quorum lost
Producers dropped messages
ZooKeeper quorum recovered
Producers recovered?
![Page 43: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/43.jpg)
Message drop resumed for two largest Kafka clusters with 300+ brokers
Controllers bounced and changed
Began rolling restart
![Page 44: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/44.jpg)
• There are times things can go wrong ... and no turning back
• Reduce complexity
• Minimize blast radius
• Find a way to start over fresh
Lessons Learned
![Page 45: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/45.jpg)
• Prefer multiple small clusters
• Largest cluster has less than 200 brokers
• Limit the total number of partitions for a cluster to 10,000
• Strive for even distribution of replicas
• Have dedicated ZooKeeper cluster for each Kafka cluster
Deployment Strategy
![Page 46: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/46.jpg)
• Cold standby Kafka cluster with 3 instances and different instance type• Different ZooKeeper cluster with no state• When failover occurs
• Scale up failover cluster• Create topics/new routing jobs for failover cluster• Switch producer traffic!
Failover
![Page 47: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/47.jpg)
• Fix it!
• Rebuild it!
• Offline maintenance
• Fail back when ready
Post Failover
![Page 48: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/48.jpg)
Failover
Samza Router Fronting
KafkaEvent Producer
X
![Page 49: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/49.jpg)
Time is the essence - failover as fast as 5 minutes
Fully Automated
Failover
![Page 50: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/50.jpg)
Best Practices
![Page 51: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/51.jpg)
Self Healing - Winston
![Page 52: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/52.jpg)
Kafka Kong
In addition, we doKafka Kong once a week
![Page 53: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/53.jpg)
What’s Next?
![Page 54: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/54.jpg)
Keystone
Samza Router
EMR
Fronting Kafka
Event Producer
Consumer Kafka
Control PlaneHTTP
PROXY
Stream Consumers
![Page 55: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/55.jpg)
Messaging as a Service Stream Processing as a Service
Management Service
![Page 56: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/56.jpg)
Keystone Messaging Service
![Page 57: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/57.jpg)
• Keystone - Unified event publishing, collection, routing for batch and stream processing• 85% of the Kafka data volume
• Ad-hoc messaging • 15% of the Kafka data volume
• Characteristics• Non-transactional• Message delivery failure does not affect user experience
Keystone Messaging Service
![Page 58: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/58.jpg)
• Standard service layer for both producers and consumers
• Managed, multi-tenant service built on top of Kafka for its performance,
scalability, simplicity and lower cost
Keystone Messaging Service
![Page 59: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/59.jpg)
App Consumers
Stream Processing Service
Con
sum
er A
PI
Prod
ucer
API
Event Producer API
HTTP
GRPC
Kafka API
Keystone Messaging Service
![Page 60: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/60.jpg)
Keystone Stream Processing Service
![Page 61: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/61.jpg)
• Stream processing consumers are weary of
• Complexity of self-managed infrastructure
• Choice of multiple runtimes across different platforms - none of which can
solve all their uses cases.
• Stream processing consumers need
• Simple unified model / API / UI / System
Keystone Stream Processing Service
![Page 62: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/62.jpg)
• Managed, multi-tenant platform with a self-service model
• Works across stream processing engines in use at Netflix
Keystone Stream Processing Service
![Page 63: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/63.jpg)
Keystone UI
Container Runtime
Job API Bounded /
Unbounded Data Source
1. Create 2. Submit 3. Launch Runner
Mantis / Flink / Spark
Running Job
1. Submit DSL Job
Job Dashboard
Keystone Stream Processing Service
![Page 64: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/64.jpg)
Keystone Management Service
![Page 65: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/65.jpg)
Keystone Management Service
![Page 66: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/66.jpg)
Keystone Management Service
![Page 67: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/67.jpg)
Keystone Management Service
![Page 68: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/68.jpg)
Keystone Management Service
![Page 69: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/69.jpg)
Keystone Management Service
![Page 70: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/70.jpg)
Keystone Management Service
![Page 71: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/71.jpg)
Keystone Management Service
![Page 72: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/72.jpg)
Keystone Management Service
![Page 73: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/73.jpg)
Keystone Management Service
![Page 74: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/74.jpg)
Keystone Management Service
![Page 75: How we built a 1T/day stream processing cloud platform in ...schd.ws/hosted_files/apachebigdata2016/33/How We Built a Stream... · How we built a 1T/day stream processing cloud platform](https://reader033.fdocuments.us/reader033/viewer/2022051723/5aad5ccd7f8b9a2e088e172d/html5/thumbnails/75.jpg)
???s