Real-time Big Data streaming integration for sensor networks

22
pyright © 2012 – Proprietary and Confidential Information of SQLstream Inc. Real-time Control in a Big Data World Sensors Expo, 2012 Presenter: Damian Black, SQLstream CEO

description

 

Transcript of Real-time Big Data streaming integration for sensor networks

Page 1: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.

Real-time Control in a Big Data World

Sensors Expo, 2012

Presenter: Damian Black, SQLstream CEO

Page 2: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.2

» What is Streaming Big Data?

» The “Sensor Internet” – a Real-time connected world

» Architectures for processing Real-time/Fast Big Data

» Sharing and Reusing data with Relational Streaming

» Case studies and Examples

» Relational Streaming and Hadoop

» Mapping out the data management space

» Conclusions

Agenda

Page 3: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.3

Real-time Big Data

First what is a Streaming Big Data Platform ? Stream any data in, immediately stream out real-time answers. Continuously analyze and process massive data volumes. React in real-time to each and every new record.

And what then is Relational Streaming ? A Streaming Big Data paradigm for processing data streams. Familiar relational expressions with automatic optimization. Queries executed continuously on a massively parallel scale.

Page 4: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.4

» Technology Drivers

» GPS enabled devices

» Low cost wireless sensors

» Ultra low power sensors

» Business & Environmental Drivers

» Congestion Reduction

» Smart Energy & Environment Monitoring

» V2V, V2I and Smart Transportation

» M2M

» RFIDs & the ‘Internet of Things’

A real-time connected world of sensors

Page 5: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.5

Today’s operational platforms – far from real-time

Poorly integrated operational platforms based on traditional store and process technology

Massive volumes of streaming data:

ServiceSystemSensors

Exponential Growth

Page 6: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.6

Analyze Streaming Data in Flight

SQLstream s-Server

Respond to real-time analysis

Real-time alerts and visibility with continuously streaming

results.

Historical dataused for predictivereal-time analytics

Existing operational systems and data

warehouses kept up to date in real-time with

continuous ETL

Massive volumes of streaming data:

ServiceSystemSensors

Exponential Growth

Page 7: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.7

» Fine grained dataflow: pipelined & superscalar parallel processing

» Reuse of analytics and data streams across nodes

» Avoid transactional bottlenecks – fine-grained streaming dataflow

» SQL as a parallel dataflow language – standard, familiar, proven

Streaming Data Processing – Achieving Scalability

Page 8: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.8

Streaming Data Processing & WindowsOverview of real-time processing pipelines

Real-time data streaming data feed

Example: Continuous query for real-time alerts

» CREATE VIEW sla_fulfilled AS

SELECT STREAM *

FROM orders OVER sla

JOIN shipments

ON orders.id = shipments.orderid

WHERE city = 'New York'

WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)

Data sources such as log files, sensors and API feeds are turned into streaming data feeds

Page 9: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.9

Case Study: Traffic Analytics from GPS Data

Page 10: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.10

10 meter road segments

Road segment GIS database

One GPS event per vehicle per second

Historical Trend Data

Objective: Accurate and reliable Journey Time information with dynamic updating of alternative routes, identifying ‘worse than usual’ events and predictive incident detection.

Case Study: Traffic Analytics from GPS Data

Page 11: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.11

SQL as an API – Simplifying Analytics

» Example: Compute Average speed across any subset of the

road network over rolling time windows from GPS events

11Copyright © 2012 Proprietary information of SQLstream Inc. All rights reserved

Page 12: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.12

Case Study: Real-time Seismic Event Detection

Page 13: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.13

Input Signal Data (blue) and Detected Quakes (red)

Page 14: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.14

» Many sensors streaming data over the Internet in real-time.

» Streaming analytics maintained over varying time windows.

» Aggregated and continuously sorted: streaming “order by”.

The ‘Sensor Internet’ for Services

stream Server

stream Serverstream

Serverstream Server

stream Server

stream Serverstream

Serverstream Serverstream

Serverstream Server

stream Server

Page 15: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.15

CREATE OR REPLACE PUMP "SONG_SCORE_PUMP" STOPPED AS INSERT INTO ”SERVICE_SCORE" (”serviceId", "SCORE")

SELECT STREAM

”SERVICE_ID" AS ”serviceId",

SUM("POINTS") OVER "LAST_WEEK" +

((SUM("POINTS") OVER "LAST_2_WEEKS” - SUM("POINTS") OVER "LAST_WEEK") * 0.5) +

((SUM("POINTS") OVER "LAST_3_WEEKS" - SUM("POINTS") OVER "LAST_2_WEEKS") * 0.25) +

((SUM("POINTS") OVER "LAST_4_WEEKS" - SUM("POINTS") OVER "LAST_3_WEEKS") * 0.125) AS "SCORE”

FROM ”SERVICE_SCORES”

WINDOW

"LAST_WEEK" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '7' DAY PRECEDING),

"LAST_2_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '14' DAY PRECEDING),

"LAST_3_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '21' DAY PRECEDING),

"LAST_4_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '28' DAY PRECEDING);

Streaming SQL: Decaying Service Monitor Scoring

» Millions of events per second

» Real-time service scoring

» Amazon EC2

stream Serverstream

Serverstream Serverstream

Server

stream Serverstream

Serverstream Serverstream

Serverstream Server

stream Server

stream Server

Page 16: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.16

SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,

» FROM (

» SELECT STREAM

» ROWTIME, url, “numErrorsLastMinute”,

» AVG(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING)

» AS “avgErrorsPerMinute”,

» STDDEV(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING)

» AS “stdDevErrorsPerMinute”

» FROM “ServiceRequestsPerMinute”) AS S

WHERE S.”numErrorsLastMinute” > S.”avgErrorsPerMinute” + 2 * S.”stdDevErrorsPerMinute”;

Streaming SQL – Change in Rate of Service Errors

stream Serverstream

Serverstream Serverstream

Server

stream Serverstream

Serverstream Serverstream

Serverstream Server

stream Server

stream Server

» Millions of records per second

» Real-time Bollinger Bands

» Amazon EC2

Page 17: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.17

» Sensor Data:

» Location, Power, Temperature, Pressure, Speed, …

» GPS and Mobile Devices, RFID

» System Data:

» Log files, Device records, SNMP MIBs

» Service Data:

» Usage log files, transactions, Internet, other

» Industries & Applications:

» Energy, Mining, Transportation, Manufacturing, Logistics, etc

» Performance, Security, Compliance, and Fraud Monitoring

» Error and Service Level Monitoring

» Usage, Metering and SCADA

Use Cases for S3 Data (Sensor x System x Service)

Page 18: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.18

Comparison of Big Data Processing Platforms

Hadoop style: data chunking coarse-grained dataflow

Relational Streaming: DAGs of fine-grained dataflow

Hadoop

Petabytes of stored data

Batch processing

Historical queries

High Latency

Streaming

Millions of events per sec

Stream processing

Continuous queries

Low latency

Page 19: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.19

Relational Streaming overlaying Hadoop

» Relational Stream Processors co-located with Hadoop Servers to

stream/re-stream local data

» Combination performs Real-time and Historical processing:

» Querying the future – Continuous ETL and Analytics (parallel pipelines)

» Querying the past – Hadoop batch jobs on stored tuples (parallel batches)

» Re-streaming and Re-querying (for example, scenario & sensitivity analyses)

GroupAggJoinProjectSelect

ReduceCombineMapSplit

Hadoop & Relational Streaming Server

Sort

Order

Page 20: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.20

Data Warehouses

Relational Streaming

HadoopBig Data

Messaging Middleware

Historical analysisPeriodic batches

Continuous analysisReal-time processing

High-level DeclarativeLanguage & Operation

Low-level ProceduralLanguage & Operation

Relational Streaming: A new data management quadrant

Page 21: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.21

Parallel Processing

Real-time Analysis

Parallel processing made easy, auto-optimized, massive scale.

Process, analyze, and react – all in real-time.

Query the Future

Confidential and Trade Secret SQLstream Inc. © 2012

Relational Streaming – the Next Wave of Big Data.

RT Data Integration Continuous, real-time data integration:• Give each app the view of data and format it needs• Share all your data in real-time with all your apps• Perform Continuous ETL and Data Integration

Page 22: Real-time Big Data streaming integration for sensor networks

Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.

Real-time Control in a Big Data World

Sensors Expo, 2012

Presenter: Damian Black, SQLstream CEO