Your Guide to Streaming - The Engineer's Perspective
-
Upload
ilya-ganelin -
Category
Engineering
-
view
232 -
download
2
Transcript of Your Guide to Streaming - The Engineer's Perspective
![Page 1: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/1.jpg)
Stream Computing (The engineer’s
perspective)Ilya Ganelin
![Page 2: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/2.jpg)
Batch vs. Stream• Batch• Process chunks of data instead of one at a time• Throughput over latency (seconds, minutes, hours)• E.g. MapReduce, Spark, Tez
• Stream• Data processed one at a time• Latency over throughput (microseconds, milliseconds)• E.g. Storm, Flink, Apex, KafkaStreams, GearPump
![Page 3: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/3.jpg)
Scalability, Performance, Durability, Availability• How do we handle more data?
• Quickly?
• Without ever losing data or compute?
• And ensure the system keeps working, even if there are failures?
![Page 4: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/4.jpg)
![Page 5: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/5.jpg)
What are the tradeoffs?• If we focus on scalability, it’s harder to guarantee
• Durability – more moving pieces, more coordination, more failures• Availability – more failures, harder to stay operational• Performance – bottlenecks and synchronization
• If we focus on availability, it’s harder to guarantee• Performance – monitoring and synchronization overhead• Scalability and performance• Durability – must recover without losing data
• If we focus on durability, it’s harder to guarantee• Performance• Scalability
![Page 6: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/6.jpg)
Batch compute has it easy.• Get scale-out and performance by adding hardware and taking longer
• Get durability with a durable data store and recompute
• Get availability by taking longer to recover (this makes life easier!)
• In stream processing, you don’t have time!
![Page 7: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/7.jpg)
It’s not about performance and scale.• Most platforms handle large volume of data relatively quickly
• It’s about:• Ease of use – how quickly can I build a complex application? Not word count.
• Failure-handling – what happens when things break?
• Durability – how do I avoid losing data without sacrificing performance?
• Availability – how can I keep my system operational with a minimum of labor and without sacrificing performance?
![Page 8: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/8.jpg)
Next: Case Studies in Open-Source Streaming• Storm
• Flink
• Apex
![Page 9: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/9.jpg)
Apache Storm• Tried and true, was deployed on 10,000 node clusters at Twitter
• Scalable• Performant• Easy to use
• Weaknesses:• Failure handling• Operationalization at scale• Flexibility
• Obsolete?
![Page 10: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/10.jpg)
How does it work?
![Page 11: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/11.jpg)
How does it work?
![Page 12: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/12.jpg)
Failure Detection
![Page 13: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/13.jpg)
Failure Detection
No durability of data in flight or guarantee of exactly once processing!
![Page 14: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/14.jpg)
Where do the weakness come from?• Nimbus was a single point of failure (fixed as of 1.0.0 release)• Upstream bolt/spout failure triggers re-compute on entire tree• Can only create parallel independent stream by having separate redundant
topologies
• Bolts/spouts share JVM Hard to debug• Failed tuples cannot be replayed quicker than 1s (lower limit on Ack)• No dynamic topologies• Cannot add or remove applications without service interruption• Poor resource sharing in large clusters
![Page 15: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/15.jpg)
Enter the Competition – Apache Flink• Declarative functional API (like Spark)
• But, true streaming platform with support for CEP
• Optimized query execution
• Fast-growing popularity
![Page 16: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/16.jpg)
How does it work?
![Page 17: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/17.jpg)
Failure Handling
![Page 18: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/18.jpg)
So what’s different from Storm?• Flink handles planning and optimization for you• Abstracts lower level internals• Clear semantics around windowing (which Storm has lacked)• Failure handling is lightweight and fast!• Exactly once processing (given appropriate connectors at start/end)• Can run Storm
![Page 19: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/19.jpg)
What can’t it do?• Dynamically update topology• Dynamically scale• Recover from errors without stopping the entire DAG• Allow fine-grained control of how data moves through the system –
locality, data partitioning, routing• You can do these individually, but not all at once• The high level API can be a curse (like in Spark)!
![Page 20: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/20.jpg)
So what else is there?
Onyx
![Page 21: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/21.jpg)
Which are unique?
• Apache Beam (Google’s baby - unifies all the platforms)
• Apache Apex (Robust architecture, scalable, fast, durable)
• IBM InfoSphere Streams (proprietary, expensive, the best)
![Page 22: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/22.jpg)
Let’s look at Apex• Unique provenance
• Built for the business from experience at Yahoo! – not a research project• Built for reliability and strict processing semantics, not performance• Apex just works
• Strengths• Dynamism• Scalability• Failure-handling
• Weaknesses• Nascent high-level API• More complex architecture
![Page 23: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/23.jpg)
How does it work?
![Page 24: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/24.jpg)
![Page 25: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/25.jpg)
Failure Handling
![Page 26: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/26.jpg)
So it’s the best? Sort of!• Most robust failure-handling• Allows fine-tuning of data flows and DAG setup• Excellent exploratory UI
• But• Learning curve• Nascent high-level API• No machine learning support• Built for business, not for simplicity
![Page 27: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/27.jpg)
Streaming is great – what about state?• What if I need to persist data?
• Across operators?
• Retrieve it quickly?
• Do complex analytics?
• And build models?
![Page 28: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/28.jpg)
Why state?• Historical features (e.g. spend amount over 30 days)
• Statistical aggregates
• Machine learning model training
• Why Cross operator? Because of how data is partitioned, allows aggregation over multiple fields.
![Page 29: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/29.jpg)
Distributed In-Memory Databases• Can support low-latency streaming use cases
• Durability becomes complicated because memory is volatile
• Memory is expensive and limited
• Examples: ScyllaDB, Geode, Memcached, Redis, MemSQL, Ignite, Hazelcast, Distributed Hash Tables
![Page 30: Your Guide to Streaming - The Engineer's Perspective](https://reader036.fdocuments.us/reader036/viewer/2022062905/5871a1c81a28ab044e8b6fc7/html5/thumbnails/30.jpg)