Chapter 10: Stream-based Data Management

14
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors: Navendu Jain, Lisa Amini, et. al.

description

Chapter 10: Stream-based Data Management. Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors: Navendu Jain, Lisa Amini, et. al. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core. - PowerPoint PPT Presentation

Transcript of Chapter 10: Stream-based Data Management

Page 1: Chapter 10: Stream-based Data Management

Chapter 10: Stream-based Data Management

• Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core

• Authors: Navendu Jain, Lisa Amini, et. al.

Page 2: Chapter 10: Stream-based Data Management

Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core

• Problem– Problem Statement– Why is this problem important?– Why is this problem hard?

• Approaches– Approach description, key concepts– Contributions (novelty, improved)– Assumptions

Page 3: Chapter 10: Stream-based Data Management

Problem Statement

• Given– Stream data, continuous queries in large-scale distributed environments

– Streaming data application (Linear Road)

– Stream processing middleware (Stream Processing Core, SPC)

• Find:– Performance bottlenecks of streaming data applications

• Objectives– Understand the performance characteristics of the stream data

application

• Constraints– SPC is constantly overloaded with respect to the available resources.

– Processing elements are a mix of I/O-bound as well as CPU-bound.

– It is unrealistic for applications to store the full history of a stream in memory. Memory-bound.

Page 4: Chapter 10: Stream-based Data Management

Why is this problem important?

• High volume, continuous data are ubiquitous.– Text and transactional data– Digital audio, video, and image– Instant messages, network packet traces– Sensor data

• Stream processing applications become important in the networking and database community.

Page 5: Chapter 10: Stream-based Data Management

Why is this problem Hard?

• Stream data are– Large volume– High data rates– Generated by multiple distributed data sources– Rapidly updated

• Processing stream data requires– Filtering– Aggregation– Correlation

• A system supporting the stream data processing applications should consider– Scalability– Latency– Resource utilization

Page 6: Chapter 10: Stream-based Data Management

Novelty of Contribution

• Related Work– DataCutter, StreaMIT: Connections between applications are statically

determined.– TelegraphCQ, Aurora, Borealis, STREAM: provide support for stream

data manipulation from a database-centric perspective, but, process streams of tuples individually. (i.e., small-scale)

– Benchmarks: Previous works on Linear Road did not report any performance number

• Contributions– SPC is dynamic application composition.– Evaluate the SPC using the Linear Road application employing multiple

distributed configurations. Highly scalable implementation of the Linear Road application

– Study the behavior of the streaming infrastructure support for large-scale continuous and historical queries. Addressing performance bottlenecks and tuning them.

Page 7: Chapter 10: Stream-based Data Management

SPC Architecture

• Publish-subscribe model

– Each processing element (PE) that consumes and produces stream data specifies the characteristics of the streams.

– SPC dynamically determines the stream connections by matching stream descriptors as new applications and new data sources join and leave the system.

• Reusing streams– Results in significant resource

savings.– Discovers useful info. over an ever-

changing set of data sources.

Page 8: Chapter 10: Stream-based Data Management

Performance Challenges and Optimizations in SPC

• Challenges– The PEs consist of performing

• Small amount of processing on large volumes of data• Large amount of processing on lower volumes of data• Thus, a mix of I/O-bound & CPU-bound

– Impossible to store stream history in memory memory-bound

• Optimizations– SDO filtering: SPC can filter out unwanted objects saving

resources.– Events: PEs can subscribe to system events. Can adapt its

algorithm.– Dynamic copies of PEs

Page 9: Chapter 10: Stream-based Data Management

Linear Road Benchmark

• Simulates the traffic characteristics of a simple urban expressway system.

• Input to the Linear Road benchmark is stream data format.

• Requires stream-based data management system (SDMS) to process a set of continuous and historical queries.

Page 10: Chapter 10: Stream-based Data Management

Prototype Implementation

• Design principles– Modularity– Data Aggregation– Network and Data Locality– Flexible Programming Environment

• Linear Road in SPC– The figure shows the

query network

infrastructure

comprising 15 PEs.

Page 11: Chapter 10: Stream-based Data Management

Experiments

• Input data is increasing over time for stress-test

• Scalability

Page 12: Chapter 10: Stream-based Data Management

Experiments

• Analyzing Bottleneck PEs

• PE Placement Policy

Page 13: Chapter 10: Stream-based Data Management

Summary

• Paper’s focus– Understanding the performance characteristics of stream processing

applications in a distributed setup

• Ideas – Design and implementation of the Linear Road benchmark on the SPC

middleware.– Identify the main performance bottlenecks to achieve scalability and low

query response latency

• Contributions– Demonstrate a scalable distributed implementation of Linear Road– Highlight the importance of addressing performance bottlenecks

• Analytical Validation– Experiments– Prototyping

Page 14: Chapter 10: Stream-based Data Management

Assumptions, Rewrite today

• Assumptions– Restrict evaluation to SPC support for the Linear Road

application assuming that their design decisions are performance results are applicable to other streaming applications.

– The system is constantly overloaded with respect to the available resources.

– PEs are I/O, CPU, and memory bound.

• Rewrite today– Apply the ideas to other types of streaming applications.– More extensive experiments on performance tuning.