Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform...

Post on 16-Mar-2020

3 views 0 download

Transcript of Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform...

BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH

http://guidoschmutz.wordpress.com@gschmutz

Streaming VisualizationDOAG Konferenz 2019Guido Schmutz

Agenda

1. Motivation / Introduction

2. Stream Data Integration & Stream Analytics Ecosystem

3. Three Blueprints for Streaming Visualization

End-to-End Demo available here:https://github.com/gschmutz/various-demos/tree/master/streaming-visualization

BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH

GuidoWorking at Trivadis for more than 22 yearsConsultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast DataOracle Groundbreaker Ambassador & Oracle ACE Director

@gschmutz guidoschmutz.wordpress.com

175th

edition

Motivation / Introduction

Timely decisions require new data immediately

Keep the data in motion …

Data at Rest Data in Motion

Store

(Re)Act

Visualize/Analyze

StoreAct

Analyze

111010101010110

111010101010110

vs.

Visualize

Hadoop ClusterdHadoop ClusterBig Data

Reference Architecture for Data Analytics Solutions

SQL

Search

Service

BI Tools

Enterprise Data Warehouse

Search / Explore

File Import / SQL Import

Event Hub

Data Flow

Data FlowChange DataCapture Parallel

Processing

Storage

Storage

Raw

Ref

ined

Results

SQL Export

Microservice State

{ }

API

StreamProcessor

State

{ }

API

EventStream

EventStream

Search

Service

Stream Analytics

MicroservicesEnterprise Apps

Logic

{ }

API

Edge Node

Rules

Event Hub

Storage

Bulk Source

Event Source

Location

DBExtract

File

DB

IoTData

MobileApps

Social

Event Stream

Telemetry

Two Types of Stream Processing(by Gartner)

Stream Data Integration• focuses on the ingestion and processing of

data sources targeting real-time extract-transform-load (ETL) and data integration use cases

• filter and enrich the data

Stream Analytics• targets analytics use cases

• calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events)

• Complex events may signify threats or opportunities that require a response from the business

Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte

Stream Data Integration & Stream Analytics Ecosystem

Stream Data Integration & Stream Analytics Ecosystem

Stream Analytics

Event Hub

Open Source Closed Source

Stream Data Integration

Source: adapted from Tibco

Edge

Apache Kafka – A Streaming Platform

Kafka Cluster

Consumer 1 Consume 2r

Broker 1 Broker 2 Broker 3Zookeeper Ensemble

ZK 1 ZK 2ZK 3

Schema Registry

Service 1

Management

Control Center

Kafka Manager

KAdmin Producer 1 Producer 2

kafkacat

Data Retention:• Never• Time (TTL) or Size-based• Log-Compacted based

Producer3Producer3

ConsumerConsumer 3

Apache Kafka – A Streaming Platform

SourceConnector

SinkConnector

trucking_driver

KSQL Engine

Kafka Streams

Kafka Broker

Demo using Kafka Stack for Stream Data Integration

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data Flow ??

Filter: #doag2019,….User: @gschmutz

Demo: Kafka Connect to retrieve Tweets

curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{

"name": "twitter-source","config": {

"connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",

"twitter.oauth.consumerKey": "xxxxx","twitter.oauth.consumerSecret": "xxxxx","twitter.oauth.accessToken": "xxxx","twitter.oauth.accessTokenSecret": "xxxxx","process.deletes": "false","filter.keywords": "#doag2019","filter.userIds": "15148494","kafka.status.topic": "tweet-raw-v1","tasks.max": "1"}

}'

Demo: KSQL for Streaming ETL

CREATE STREAM tweet_sWITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS SELECT id , createdAt , text , user->screenNameFROM tweet_raw_s;

CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1', VALUE_FORMAT='AVRO');

SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word FROM tweet_raw_sWHERE lang = 'en' or lang = 'de';

SELECT id, LCASE(hashtagentities[0]->text) FROM tweet_raw_sWHERE hashtagentities[0] IS NOT NULL;

Demo using Kafka Stack for Stream Data Integration

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data Flow ??

Filter: #voxxeddaysbanff,#java,#kafka,….User: @VoxxedDaysBanff, @gschmutz

Visualization: many many options!

But do they all support Streaming Data?

Three Blueprints forStreaming Visualization

BP1: Fast datastore with regular polling from consumer

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

BP1-1: Elasticsearch / Kibana

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:SOLR & Banana

BP1-2: InfluxDB / Grafana or Chronograf

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:Prometheus & GrafanaDruid & Superset

BP1-3: NoSQL & Custom Web

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

BP-1: Demo Redis NoSQL & Custom Web

https://opensky-network.org/

BP1-4: Kafka Streams Interactive Query & Custom App

Storage

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

API

Data Store

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion Data at Rest

Data Flow

Alternatives:Flink…

BP2: Direct Streaming to the Consumer

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

BP2-1: Kafka Connect to Slack / WhatsApp

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Alternatives:TwitterSMS…

BP-2-1: Demo Kafka Connect to Slack

curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{

"name": "slack-sink","config": {"connector.class": "net..SlackSinkConnector","tasks.max": "1","topics":"slack-notify","slack.token":”XXXX","slack.channel":"general","message.template":"tweet by ${USER_SCREENNAME} with ${TEXT}",

}}'

BP2-2: Kafka to Tipboard (Dashboard Solution)

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Alternatives:DashingGeckoboard…

BP2-2: Demo Kafka to Tipboard (Dashboard Solution)

http://allegro.tech/tipboard/

BP2-2: Demo Kafka to Tipboard (Dashboard Solution) c.subscribe(['DASH_TWEET_COUNT_BY_HOUR_T'])

while True:msg = c.poll(1.0)

data = json.loads(msg.value().decode('utf-8'))data_selected = data.get('NOF_TWEETS’)data_prepared = prepare_for_just_value(data_selected)data_jsoned = json.dumps(data_prepared)data_to_push = { 'tile': TILE_NAME, 'key': TILE_KEY

, 'data': data_jsoned }resp = requests.post(API_URL_PUSH, data=data_to_push)

def prepare_for_just_value(data):# data={"title": "Number of Tweets:", "description": "(1 hour)", "just-value": "23"

data_prepared = datadata_prepared = {'title': '# Tweets:', 'description': 'per hour’,

'just-value': data_prepared}return data_prepared

BP2-3: Web Sockets / SSE & Custom Modern Web App

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics

StreamingVisualization

Data Flow

ConsumerData Sources

Data In Motion

Data Flow

Channel/Protocol

API

Sever Sent Event (SSE)

BP3: Streaming SQL Result to Consumer

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

BP3-1: KSQL and Arcadia Data

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

BP3-1: Demo KSQL and Arcadia Data

https://www.arcadiadata.com/

BP3-2: KSQL with REST API to Custom Web App

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

BP3-2: Demo KSQL with REST API

curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’ -i http://analyticsplatform:8088/query --data '{

"ksql": "SELECT text FROM tweet_raw_s;","streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }

}'

{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to @Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy \n\n#SaturdayThoughts#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics #…"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her while you can! �@ARUKscientist� �@S_Bauermeister� #bigdata #ARUKConfhttps://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}

{"row":{"columns":["Blockchain Competitive Innovation Advantage"]},"errorMessage":null,"finalMessage":null}

BP3-3: Spark Streaming & Oracle Stream Analytics

Stream Analytics

Event Hub

Stream Data Integration & Stream Analytics ConsumerData Sources

Data In Motion

Data Flow

API StreamingVisualization

BP3-3: Demo Spark Streaming & Oracle Stream Analytics

https://www.oracle.com/middleware/technologies/complex-event-processing.html

Summary

BP1: Fast Store & Polling

• “classic” pattern

• Not end-to-end “data-in-motion” -> “Data-at-rest” before visualization

• Slight delay might not be acceptable for monitoring dashboard

• Can use full power of data store(s) => NoSQL

• In-memory reduces overhead

BP2: Stream to Consumer

• minimal latency

• More difficult on “client side”

• good if stream holds directly what should be displayed

• More difficult if data in stream needs to be analyzed before visualization

• No historical info available

BP3: Streaming SQL

• Minimal latency

• Power of SQL query engine available for visualization

• possibility for “self-service” style visualization

• Some analytics are more difficult on streaming data

• No historical info available