Streaming Analytics & CEP - Two sides of the same coin?
-
Upload
till-rohrmann -
Category
Data & Analytics
-
view
907 -
download
1
Transcript of Streaming Analytics & CEP - Two sides of the same coin?
![Page 1: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/1.jpg)
Till Rohrmann [email protected] @stsffap
Fabian Hueske [email protected] @fhueske
Streaming Analytics & CEP Two sides of the same coin?
![Page 2: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/2.jpg)
Streams are Everywhere § Most data is continuously produced as stream
§ Processing data as it arrives is becoming very popular
§ Many diverse applications and use cases
2
![Page 3: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/3.jpg)
Complex Event Processing § Analyzing a stream of events and drawing conclusions
• Detect patterns and assemble new events
§ Applications • Network intrusion • Process monitoring • Algorithmic trading
§ Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
3
![Page 4: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/4.jpg)
Batch Analytics
4
§ The batch approach to data analytics
![Page 5: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/5.jpg)
Streaming Analytics § Online aggregation of streams
• No delay – Continuous results
§ Stream analytics subsumes batch analytics • Batch is a finite stream
§ Demanding requirements on stream processor • High throughput • Exactly-once semantics • Event-time & advanced window support
5
![Page 6: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/6.jpg)
Apache Flink™ § Platform for scalable stream processing
§ Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time & advanced windowing
§ Core DataStream API available for Java & Scala 6
![Page 7: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/7.jpg)
This Talk is About § Flink’s new APIs for CEP and Stream Analytics • DSL to define CEP patterns and actions • Stream SQL to define queries on streams
§ Integration of CEP and Stream SQL
§ Early stage à Work in progress 7
![Page 8: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/8.jpg)
Tracking an Order Process Use Case
8
![Page 9: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/9.jpg)
Order Fulfillment Scenario
9
![Page 10: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/10.jpg)
Order Events § Process is reflected in a stream of order events
§ Order(orderId,tStamp,“received”)§ Shipment(orderId,tStamp,“shipped”)§ Delivery(orderId,tStamp,“delivered”)
§ orderId: Identifies the order § tStamp: Time at which the event happened
10
![Page 11: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/11.jpg)
Aggregating Massive Streams Stream Analytics
11
![Page 12: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/12.jpg)
Stream Analytics § Traditional batch analytics
• Repeated queries on finite and changing data sets • Queries join and aggregate large data sets
§ Stream analytics • “Standing” query produces continuous results
from infinite input stream • Query computes aggregates on high-volume streams
§ How to compute aggregates on infinite streams? 12
![Page 13: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/13.jpg)
Compute Aggregates on Streams § Split infinite stream into finite “windows”
• Split usually by time
§ Tumbling windows • Fixed size & consecutive
§ Sliding windows
• Fixed size & may overlap
§ Event time mandatory for correct & consistent results! 13
![Page 14: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/14.jpg)
Example: Count Orders by Hour
14
![Page 15: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/15.jpg)
Example: Count Orders by Hour
15
SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cntFROM eventsWHERE status = ‘received’GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
![Page 16: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/16.jpg)
Stream SQL Architecture § Flink features SQL on static
and streaming tables
§ Parsing and optimization by Apache Calcite
§ SQL queries are translated into native Flink programs
16
![Page 17: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/17.jpg)
Pattern Matching on Streams Complex Event Processing
17
![Page 18: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/18.jpg)
Real-time Warnings
18
![Page 19: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/19.jpg)
CEP to the Rescue § Define processing and delivery intervals (SLAs)
§ ProcessSucc(orderId,tStamp,duration)§ ProcessWarn(orderId,tStamp)§ DeliverySucc(orderId,tStamp,duration)§ DeliveryWarn(orderId,tStamp)
§ orderId:Identifies the order § tStamp:Time when the event happened § duration:Duration of the processing/delivery
19
![Page 20: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/20.jpg)
CEP Example
20
![Page 21: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/21.jpg)
Processing: Order à Shipment
21
val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
![Page 22: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/22.jpg)
… and both at the same time! Integrated Stream Analytics with CEP
22
![Page 23: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/23.jpg)
Count Delayed Shipments
23
![Page 24: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/24.jpg)
Compute Avg Processing Time
24
![Page 25: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/25.jpg)
CEP + Stream SQL
25
// complex event processing resultval delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption)
val deliveryWarningTable: Table = delWarn.toTable(tableEnv)tableEnv.registerTable(”deliveryWarnings", deliveryWarningTable)
// calculate the delayed deliveries per dayval delayedDeliveriesPerDay = tableEnv.sql( """SELECT STREAM | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
![Page 26: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/26.jpg)
CEP-enriched Stream SQL
26
SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDurationFROM (
// CEP pattern SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp FROM inputs PATTERN a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp WITHIN INTERVAL '1’ HOUR
WHERE a.status = ‘received’ AND b.status = ‘shipped’ )GROUP BY TUMBLE(tStamp, INTERVAL '1’ DAY)
![Page 27: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/27.jpg)
Conclusion § Apache Flink handles CEP and analytical
workloads
§ Apache Flink offers intuitive APIs
§ New class of applications by CEP and Stream SQL integration J
27
![Page 28: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/28.jpg)
Flink Forward 2016, Berlin Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016 www.flink-forward.org
![Page 29: Streaming Analytics & CEP - Two sides of the same coin?](https://reader033.fdocuments.us/reader033/viewer/2022051404/5871381d1a28abf0568b6297/html5/thumbnails/29.jpg)
We are hiring! data-artisans.com/careers