React Fast by Processing Streaming Data in Real-Time

Post on 06-Jan-2017

345 views 1 download

Transcript of React Fast by Processing Streaming Data in Real-Time

React Fast by Processing Streaming DataKobi Biton, Solutions Architect, AWS

Ran Tessler, Mgr. Solutions Architecture, AWS

SpoTaxi

Mobile Apps Web Clickstream Application Logs

Metering Records IoT Sensors Smart Buildings

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

Most data is produced continuously

Recent data is highly valuable• If you act on it in time• Perishable Insights (M. Gualtieri, Forrester)

Old + Recent data is more valuable • If you have the means to combine them

The diminishing value of data

• Durable• Continuous• Fast

• Correct• Reactive• Reliable

What are the key requirements?

Ingest Transform Analyze React Persist

Processing real-time, streaming data

Amazon Kinesis Streams

Easy administration: Create a stream, set capacity level with shards. Scale to match your data throughput rate & volume.

Build real-time applications: Process streaming data w/ Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda,...

Low cost: Cost-efficient for workloads of any scale.

Amazon Kinesis Firehose

Zero administration: Capture and deliver streaming data to Amazon S3, Redshift, Elasticsearch w/o writing an app or managing infrastructure.

Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery in as little as 60 seconds

Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Capture and submit streaming data to

Firehose

Analyze streaming data using your favorite BI tools

Firehose loads streaming data continuously into S3, Amazon Redshift and Amazon Elasticsearch

Amazon Kinesis Analytics

Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills.

Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.

Easy Scalability : Elastically scales to match data throughput.

Connect to Kinesis streams,

Firehose delivery streams

Run standard SQL queries against data streams

Kinesis Analytics can send processed data to analytics tools so you can

create alerts and respond in real-time

Use SQL to build real-time applications

Easily write SQL code to process streaming data

Connect to streaming source

Continuously deliver SQL results

A streaming table is a STREAM

• In relational databases, you work with SQL tables • With Kinesis Analytics, you work with STREAMs• SELECT, INSERT, and CREATE can be used with STREAMs

CREATE STREAM Tweets(author VARCHAR(20), text VARCHAR(140));

INSERT INTO Tweets SELECT …

A simple streaming query

• Tweets about the DLD Festival Summit• Selecting from a STREAM of tweets, an in-application

stream• Each row has a corresponding ROWTIME

SELECT STREAM ROWTIME, author, textFROM TweetsWHERE text LIKE ‘%#DLDTelAviv%'

Writing queries on unbounded datasets

• Streams are unbounded data sets• Need continuous queries, row by row or across rows• WINDOWS define a start and end to the query

SELECT STREAM author, count(author) OVER ONE_MINUTE

FROM Tweets WINDOW ONE_MINUTE AS (PARTITION BY author RANGE INTERVAL '1' MINUTE PRECEDING);

Different types of Windows

Tumbling

Sliding

Amazon Kinesis: Streaming Data Made EasyServices make it easy to capture, deliver and process streams on AWS

Amazon Kinesis FirehoseFor all developers, data scientists

Easily load massive volumes of streaming data into Amazon S3, Redshift ElasticSearch

Amazon Kinesis StreamsFor Technical Developers

Collect and stream data for ordered, replayable, real-time processing

Amazon Kinesis Analytics For all developers, data scientists

Easily analyze data streams using standard SQL queries

Demo – detailed architecture

taxi-telemetryKinesis Stream

taxi-statsKinesis Stream

CalculateStatsKinesis Analytics

PipeStatsToDDBAWS Lambda

statsAmazon DynsmoDB

taxi-telemetry-to-s3Kinesis Firehose

spottaxi-dataAmazon S3

PipeTelemetryToFirehoseAWS Lambda

spottaxiAmazon Elasticsearch ServicePipeTelemetryToES

AWS Lambda