Big Data Applications Made Easy: Fact Or Fiction?

25
Pivotal ConfidentialInternal Use Only Pivotal ConfidentialInternal Use Only Spring XD Glenn Renfro grenfro @pivotal.io @CPPWFS

description

With Spring XD the answer is Fact. In short Spring XD provides a one stop shop for writing and deploying Big Data Applications. It provides a scalable, fault tolerant, distributed runtime for Data Ingestion, Analytics, and Workflow Orchestration using a single programming, configuration and extensibility model. By reducing the complexity of Big Data development, developers can focus on the business problem. In this discussion, we will cover: • The basics of Spring XD • Show how to deploy streams that will handle data received from multiple sources, and write the results to various sinks • Capture some analytics from a live data stream • Show how to create and execute Jobs • Demonstrate the failover capabilities of a XD Cluster • Discuss how to create your own custom modules

Transcript of Big Data Applications Made Easy: Fact Or Fiction?

Page 1: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use OnlyPivotal Confidential–Internal Use Only

Spring XD

Glenn Renfrogrenfro @pivotal.io@CPPWFS

Page 2: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Volume

Variety

Velocity

Veracity

420 Million Wearables

90% of enterprise data is

unstructured

60-100 sensors in each car

22 Billion sensors by 2020

86% suspect data

inaccuracy

30% revenue loss due to bad

data quality

500 million tweets each day

2.3 Trillion GBs of each day

Data

Data Points: McKinsey, Twitter, Gartner, IBM

Page 3: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Batch and Streaming

often handled by

multiple platforms

Fragmented Big Data

Ecosystem

Not all data Hadoop

bound

Page 4: Big Data Applications Made Easy: Fact Or Fiction?

“One stop shop for

developing and deploying

Big Data Applications”

SPRING XDEXTREME DATA

Page 5: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Batch and Streaming

often handled by

multiple platforms

Fragmented Big Data

Ecosystem

Not all data Hadoop

bound

Portable on-prem, YARN, EC2, PCF, Mesos,

Docker etc.

Easy to Use, Extend and Integrate with other

Technologies

Built on proven Spring EAI and Batch projects

(Volume, Velocity, Veracity, and Variety)

Unified Stream and Batch Operations

Hadoop Batch Workflow Orchestration

Predictive Analytics and Model Scoring

Spring XD to Rescue

Page 6: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Jobs, Steps,

Readers, Writers

Ingestion, Export,

Orchestration, Hadoop

Controllers, REST,

WebSocket

Channels, Adapters,

Filters, Transformers

WEBINTEGRATION BATCH BIG DATA

SPRING CORE

FRAMEWORK SECURITY GROOVY REACTOR

DATA

RELATIONAL

DATA ACCESS

NON-RELATIONAL

DATA ACCESS

BOOT

Bootable, Minimal, Ops-Ready

GRAILSFull-stack, Web

XDStream, Taps,

Jobs

IO EXECUTION

IO FOUNDATION

IO COORDINATIONSPRING CLOUD

Page 7: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Spring XD - 10,000 Foot View

Page 8: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Streams

HTTPTailFileMail

TwitterGemfireSyslog

TCPUDPJMS

RabbitMQMQTTTrigger

Reactor TCP/UDP

FilterTransformer

Object-to-JSONJSON-to-Tuple

SplitterAggregatorHTTP Client

JPMML EvaluatorShell

GroovyPython

Java

FileHDFSJDBCTCPLogMail

RabbitMQGemfireSplunkMQTT

Dynamic RouterCounters

Page 9: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Create a stream with http as a source and hdfs

as a sink. The hdfs —rollover is set to a small

value so that we can read the file on hdfs.

Page 10: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Spring XD - Distributed Runtime

XD Container XD Container

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Message Bus

ZooKeeper

Container StateXD AdminXD Admin

Spring App Context

M1 M2

Page 11: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Page 12: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Page 13: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Spring XD - Analytics

• Counters and Gauges

• Simple & Field Value Counter (how many tweets for #java)

• Aggregate Counter (how many

tweets for #java in the week/day/hr)

• Gauge & Rich Gauge (how many

requests / minute?)

• Abstract API implemented in Redis

in-memory

• Predictive Model Evaluation

• JPMML

• Is this transaction fraudulent?

• What group does this user belong to?

• Interoperable with R, Rattle,

KNIME, RapidMiner, MADLib

Page 14: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Jobs

CSV to JDBC

FTP to HDFS

JDBC to HDFS

HDFS to JDBC

HDFS to MongoDB

Page 15: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

REALTIMEVIEWS

BATCHVIEWS

Spring XD

MASTERDATASET

Spring BOOT

Spring BOOT

Spring BOOT

FILES

StreamProcessing

Analytics

Ingest

WorkflowOrchestration

Spring

XD

Export

XD>GemFire XD

PredictiveModeling

GemFire XD

SPEED

LAYER

BATCH

LAYER

SERVING

LAYER

PCF - BOSH Service PCF - Apps

MOBILE

SENSORS

SOCIAL

Page 16: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Unified runtime

for both Real-

time and Batch

use cases

Scalable,

Distributed and

Fault Tolerant

Runtime

Increased

Productivity through

out-of-the-box

components

Closed Loop

Analytics through

online (stream) and

offline (batch) data

Swiss-army knife of data

movement and data

pipelines

Repeatable ‘turnkey’

solution for next generation

data-centric use cases

Page 17: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Agility: Easy to Setup and Run

Writing HTTP Data

to HDFS

…that simple!

or

or

or

Page 18: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Spring XD on YARN

Spring XD Running

on

YARN!

Copies Files to

HDFSCreates

manifest.yml

Spring Boot App

‘xd-yarn start admin’

Spring Boot App

‘xd-yarn start container’

Spring Boot App

Page 19: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Even easier with PCF

Page 20: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Natural Fit: Reactive Streaming Pipelines

Moving Average

‘collect values every 500ms’

Non-Blocking

Backpressure

“take all these items I have whether you can

handle them or not”

“give me the next N available items”

OLD

NEWMicrobatching

‘either 1024b or 350ms; trigger downstream processing’

Page 21: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Deployment Manifest – Module Count

• http | doWork | hdfs

http

http

doWork

doWork

doWork

doWork

hdfs

hdfs

hdfs

stream deploy –name s1

--properties

module.http.count=2,

module.doWork.count=4,

module.hdfs.count=3

Page 22: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Deployment Manifest – Module Placement

• http | doWork | hdfs

http

http

doWork

doWork

doWork

doWork

hdfs

hdfs

hdfs

stream deploy –name s1

--properties

module.http.count=2,

module.doWork.count=4,

module.hdfs.count=3,

module.http.criteria =

groups.contains(‘WEB’)

WEB

Page 23: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Deployment Manifest – Data Partitioning

• http | doWork | hdfs

http

http

doWork

doWork

doWork

doWork

hdfs

hdfs

hdfs

stream deploy –name s1

--properties

...

module.http.producer

.partitionKeyExpression =

payload.customerId

WEB

doWork modules will always

process the same set of customer

IDs

Page 24: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

Learn More

• Project: http://projects.spring.io/spring-xd/

• GitHub: https://github.com/spring-projects/spring-xd/

• Wiki: https://github.com/spring-projects/spring-xd/wiki

• Samples: https://github.com/spring-projects/spring-xd-samples

Page 25: Big Data Applications Made Easy: Fact Or Fiction?

Pivotal Confidential–Internal Use Only

A NEW PLATFORM FOR A NEW ERA