Real time analytics with Kafka and SparkStreaming

25
1 © Cloudera, Inc. All rights reserved. Real-time analytics with Kafka and Spark Streaming Ashish Singh | Software Engineer, Cloudera

Transcript of Real time analytics with Kafka and SparkStreaming

Page 1: Real time analytics with Kafka and SparkStreaming

1© Cloudera, Inc. All rights reserved.

Real-time analytics with Kafka and Spark StreamingAshish Singh | Software Engineer, Cloudera

Page 2: Real time analytics with Kafka and SparkStreaming

2© Cloudera, Inc. All rights reserved.

It’s Real-Time timeWhy now? Complex Event Processing (CEP) is not a new concept.

Page 3: Real time analytics with Kafka and SparkStreaming

3© Cloudera, Inc. All rights reserved.

It’s Real-Time time

Emergence of Real-Time

Stream Processing

Exponential growth in

continuous data streams

Open Source tools for

reliable high-throughput low latency event

queuing and processing

Tools run on “Commodity”

Hardware

Why now? Complex Event Processing (CEP) is not a new concept.

Page 4: Real time analytics with Kafka and SparkStreaming

4© Cloudera, Inc. All rights reserved.

It’s happening! …Across IndustriesCredit Card& MonetaryTransactionsIdentifyfraudulent transactions as soon as they occur.

Transportation& Logistics• Real-timetrafficconditions• Trackingfleet and cargo locations and dynamic re-routing to meet SLAs

Retail• Real-timein-storeOffers and Recommendations.• Email andmarketing campaigns based on real-time social trends

Consumer Internet, Mobile & E-Commerce

Optimize userengagement basedon user’s currentbehavior. Deliver recommendations relevant “in the moment”

HealthcareContinuouslymonitor patientvital stats and proactively identifyat-risk patients.

Manufacturing• Identifyequipmentfailures and react instantly• Perform proactivemaintenance.• Identify product quality defects immediately to prevent resource wastage.

Security & Surveillance

Identifythreatsand intrusions, both digital and physical, in real-time.

Digital Advertising& MarketingOptimize and personalize digital ads based on real-time information.

Page 5: Real time analytics with Kafka and SparkStreaming

5© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Page 6: Real time analytics with Kafka and SparkStreaming

6© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

Page 7: Real time analytics with Kafka and SparkStreaming

7© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

FilterEnrich

TransformStats on Sliding Windows

Stream JoinsFeature Engineering Predictive Analytics

Active Model Training....

And combinations of the above

Page 8: Real time analytics with Kafka and SparkStreaming

8© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

HDFS

Page 9: Real time analytics with Kafka and SparkStreaming

9© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Page 10: Real time analytics with Kafka and SparkStreaming

10© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Page 11: Real time analytics with Kafka and SparkStreaming

11© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Page 12: Real time analytics with Kafka and SparkStreaming

12© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Page 13: Real time analytics with Kafka and SparkStreaming

13© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Kafka

Page 14: Real time analytics with Kafka and SparkStreaming

14© Cloudera, Inc. All rights reserved.

Canonical Stream Processing Architecture

DataSources

Kafka Flume

NoSql

HDFS

Kafka ...

Page 15: Real time analytics with Kafka and SparkStreaming

15© Cloudera, Inc. All rights reserved.

Too much?

15© Cloudera, Inc. All rights reserved.

Page 16: Real time analytics with Kafka and SparkStreaming

16© Cloudera, Inc. All rights reserved.

Example application to demonstrate how real time analytics can be done using Kafka and Spark Streaming

Pankh

https://github.com/SinghAsDev/pankh

Page 17: Real time analytics with Kafka and SparkStreaming

17© Cloudera, Inc. All rights reserved.

Pankh – Building Pieces

DataSources

Page 18: Real time analytics with Kafka and SparkStreaming

18© Cloudera, Inc. All rights reserved.

Pankh – Building Pieces

DataSources

Kafka

Page 19: Real time analytics with Kafka and SparkStreaming

19© Cloudera, Inc. All rights reserved.

Pankh – Building Pieces

DataSources

Kafka

Page 20: Real time analytics with Kafka and SparkStreaming

20© Cloudera, Inc. All rights reserved.

Pankh – Building Pieces

DataSources

Kafka

NoSql

Page 21: Real time analytics with Kafka and SparkStreaming

21© Cloudera, Inc. All rights reserved.

Pankh – Building Pieces

DataSources

Kafka

NoSql

Page 22: Real time analytics with Kafka and SparkStreaming

22© Cloudera, Inc. All rights reserved.

Demo Time

22© Cloudera, Inc. All rights reserved.

Page 23: Real time analytics with Kafka and SparkStreaming

25© Cloudera, Inc. All rights reserved.

Kappa Architecture

Page 24: Real time analytics with Kafka and SparkStreaming

26© Cloudera, Inc. All rights reserved.

Demo Time

26© Cloudera, Inc. All rights reserved.

Page 25: Real time analytics with Kafka and SparkStreaming

27© Cloudera, Inc. All rights reserved.

Thank youAshish [email protected]@singhasdev