Streaming Data, Continuous Queries, and Adaptive Dataflow
description
Transcript of Streaming Data, Continuous Queries, and Adaptive Dataflow
![Page 1: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/1.jpg)
Streaming Data, Continuous Queries, and Adaptive Dataflow
Michael FranklinUC Berkeley
NRC June 2002
.
![Page 2: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/2.jpg)
2
Data Stream ProcessingNetworked data streams central to current and
future computing.Existing data management and query processing
infrastructure is lacking:– Adaptability– Continuous and Incremental Processing– Work Sharing for large scale– Resource scalability: from “smart dust” up to clusters
to grids.XML provides additional opportunites.
![Page 3: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/3.jpg)
3
Example 1: “Transactional Flows”
E-Commerce, clickstream, swipestream, logs…
Network Monitoring B2B and Enterprise apps
– Supply-Chain, CRM, ERP (Quasi) real-time flow of events and data Must manage these flows to drive business
processes. Mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.
![Page 4: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/4.jpg)
4
Example 2: Information Dissemination
User Profiles
Users
Filtered Data
Data Sources
•Doc creation or crawler initiates flow of data towards users.•profiles are aggregated back towards data.
![Page 5: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/5.jpg)
5
Example 3: Sensor Nets
Tiny (or not so tiny) devices measure the physical world.– Berkeley “motes”, Smart Dust, Smart Tags, …
Many monitoring applications– Transportation, Seismic, Energy, Military…
Form dynamic ad hoc networks. Aggregate and communicate streams of values. Not one way – can actuate to effect or actively
monitor the environment
![Page 6: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/6.jpg)
6
Common Features Centrality of Dataflow and Data Routing
– Architecture is focused on data movement– Moving streams of data through code in a network
Volatility of the environment– Dynamic resources & topology, partial failures– Long-running (never-ending?) tasks– Potential for user interaction during the flow– Large Scale: users, data, resources, …
Resource Constraints– Bandwidth, memory,processing,battery,…– Time and human attention
![Page 7: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/7.jpg)
7
In The Beginning
Data
Query
Index
Result
![Page 8: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/8.jpg)
8
Pub Sub/CQ/Filtering
Queries
Dat
a
Index Result
•Effectively processes all queries simultaneously.•Shares work for common sub-expressions.
![Page 9: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/9.jpg)
9
Telegraph/PSoup: Query & Data Duality
Queries
Index
Result
DataData
Index
![Page 10: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/10.jpg)
10
Telegraph/PSoup: Query & Data Duality
Queries
Index
Result
Data
Index
Query
![Page 11: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/11.jpg)
11
PSoup – Query Invocation PSoup continuously maintains materialized views over
streaming data and queries. Data is returned to user when query is invoked.
– Invocation requires applying “windows” to precomputed results.
Adaptive approach allows system to continuously absorb new data and new queries without recompilation.
Lots of issues to study: – Query indexing, Spilling to disk, bulk processing– Other semantics and interaction models (e.g., alerts)
![Page 12: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/12.jpg)
12
Stream Processing Research Agenda Need continuously-adaptive processing. Need appropriate data model & query lang.
– Window semantics: input and output– Notification semantics & thresholds
Approximation, satisficing, and QoS– must be driven by user needs and context– adapt to available resources & time constraints
Integration & interaction with “pooled” data.– time travel, archiving, “normal” databases
Structured, semi-, and un- data; XML etc. Sensor-sensitive processing. Metrics and Benchmarks (challenge problems).
![Page 13: Streaming Data, Continuous Queries, and Adaptive Dataflow](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816797550346895ddccf43/html5/thumbnails/13.jpg)
13
Conclusions Dataflow and streaming are central to many
emerging application areas.– Solutions require a mixture of database and
networking approaches:adaptivity and tolerance of partial failureexploitation of user, app, and data semantics
A new infrastructure is needed for solving these problems. – Duality of Data and Queries
Currently a topic of major interest in the research community.