Develop Powerful Big Data Applications Easily with SpringXD

39
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission. Develop Powerful Big Data Applications Easily with SpringXD Mark Fisher & Mark Pollack

Transcript of Develop Powerful Big Data Applications Easily with SpringXD

Page 1: Develop Powerful Big Data Applications Easily with SpringXD

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Develop Powerful Big Data Applications

Easily with SpringXDMark Fisher & Mark Pollack

Page 2: Develop Powerful Big Data Applications Easily with SpringXD

Mark Fisher

• Spring XD – Co Lead

• Spring Integration

• Spring Framework

• Spring AMQP

Mark Pollack

• Spring XD – Co Lead

• Spring Data

• Spring Framework

• Spring .NET

2

Speakers

Page 3: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD

XD = eXtreme Data

3

Page 4: Develop Powerful Big Data Applications Easily with SpringXD

“One stop shop for

developing and deploying

Big Data Applications”

4

Page 5: Develop Powerful Big Data Applications Easily with SpringXD

What is a Big Data Application?

Page 6: Develop Powerful Big Data Applications Easily with SpringXD

Big Data Architecture

StreamProcessing

Analytics

Ingest

WorkflowOrchestration

Spring

XD

Export

FILES

SOCIAL

SENSORS

MOBILE

XD>

MASTERDATASET

PredictiveModeling BATCH

VIEWS

REALTIMEVIEWS

Spring BOOT

Spring BOOT

Spring BOOT

Page 7: Develop Powerful Big Data Applications Easily with SpringXD

REALTIMEVIEWS

BATCHVIEWS

Spring XD

MASTERDATASET

Spring BOOT

Spring BOOT

Spring BOOT

FILES

SOCIAL

SENSORS

MOBILE

StreamProcessing

Analytics

Ingest

WorkflowOrchestration

Spring

XD

Export

XD>

PredictiveModeling

Lambda ArchitectureSPEED

LAYER

BATCH

LAYER

SERVING

LAYER

Page 8: Develop Powerful Big Data Applications Easily with SpringXD

REALTIMEVIEWS

BATCHVIEWS

Spring XD

MASTERDATASET

Spring BOOT

Spring BOOT

Spring BOOT

FILES

SOCIAL

SENSORS

MOBILE

StreamProcessing

Analytics

Ingest

WorkflowOrchestration

Spring

XD

Export

XD> GemFire XD

PredictiveModeling

GemFire XD

SPEED

LAYER

BATCH

LAYER

SERVING

LAYER

Page 9: Develop Powerful Big Data Applications Easily with SpringXD

Spring IO Platform

9

Jobs, Steps,

Readers, Writers

Ingestion, Export,

Orchestration, Hadoop

Controllers, REST,

WebSocket

Channels, Adapters,

Filters, Transformers

WEBINTEGRATION BATCH BIG DATA

SPRING CORE

FRAMEWORK SECURITY GROOVY REACTOR

DATA

RELATIONAL

DATA ACCESS

NON-RELATIONAL

DATA ACCESS

BOOT

Bootable, Minimal, Ops-Ready

GRAILSFull-stack,

Web

XDStream, Taps,

Jobs

IO EXECUTION

IO FOUNDATION

IO COORDINATIONSPRING CLOUD

Page 10: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD: Unified Platform for Big Data

10

Spring XD Runtime

BIDIRECTIONAL

Compute

HDFS

RDBMS

NoSQL

R, SAS

Streams Jobs

ingest workflow

export

taps

Predictive Modelling

>_

Redis

Page 11: Develop Powerful Big Data Applications Easily with SpringXD

Streams

Spring XD

HTTPTailFileMail

TwitterGemfireSyslog

TCPUDPJMS

RabbitMQMQTTTrigger

Reactor TCP/UDP

FilterTransformer

Object-to-JSONJSON-to-Tuple

SplitterAggregatorHTTP Client

Groovy ScriptsJava Code

JPMML Evaluator

FileHDFSJDBCTCPLogMail

RabbitMQGemfireSplunkMQTT

Dynamic RouterCounters

Page 12: Develop Powerful Big Data Applications Easily with SpringXD

Demo:

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a

Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spring XD - Streams

12

Page 13: Develop Powerful Big Data Applications Easily with SpringXD

Taps

Spring XD

• “Listen” to data on another stream

Page 14: Develop Powerful Big Data Applications Easily with SpringXD

Analytics

• Counters and Gauges

– Simple & Field Value Counter

• How many tweets for #java

– Aggregate Counter

• How many tweets for #java in the week/day/hour

– Gauge & Rich Gauge

• How many requests per minute?

• Abstract API. Implemented in

– In-Memory

– Redis

Page 15: Develop Powerful Big Data Applications Easily with SpringXD

Demo:

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a

Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spring XD - Taps

15

Page 16: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD Runtime

XD Container XD Container

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

ZooKeeper

Container StateXD AdminXD Admin

Page 17: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD Runtime

XD Container XD Container

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

Spring App Context

M1

ZooKeeper

Container StateXD AdminXD Admin

Page 18: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD Runtime

XD Container XD Container

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

Spring App Context

M1

ZooKeeper

Container StateXD AdminXD Admin

M2

Page 19: Develop Powerful Big Data Applications Easily with SpringXD

Deployment Manifest

Spring XD

Page 20: Develop Powerful Big Data Applications Easily with SpringXD

Deployment Manifest

Spring XD

• The stream/job definition defines the logical view of processing

• The deployment manifest defines the physical view of processing

• Important properties relate to module count and data partitioning

xd:>stream create test1 --definition

"http | transform --expression=payload.toUpperCase() | log”

xd:>stream deploy --name test1 --properties "module.transform.count=3"

Page 21: Develop Powerful Big Data Applications Easily with SpringXD

Deployment Manifest – Data Partitioning

Spring XD

stream create words --definition "http |

splitter --expression=payload.split(' ') | log"

stream deploy words --properties

module.splitter.producer.partitionKeyExpression=payload,module.log.count=2

http post --data

"How much wood would a woodchuck chuck if a woodchuck could chuck wood"

Page 22: Develop Powerful Big Data Applications Easily with SpringXD

Demo:

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a

Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spring XD - Partitioning

22

Page 23: Develop Powerful Big Data Applications Easily with SpringXD

Distributed, Fault Tolerant Runtime

Spring XD

Page 24: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Container XD Container

Spring XD

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

Spring App Context

M1

ZooKeeper

Container StateXD AdminXD Admin

M2

Page 25: Develop Powerful Big Data Applications Easily with SpringXD

XD Container

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

ZooKeeperContainer State

XD AdminXD Admin

M2

Page 26: Develop Powerful Big Data Applications Easily with SpringXD

XD Container

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Admin(leader)

XD ShellHTTP POST /streams/aStream “M1 | M2”

Data Transport

ZooKeeperContainer State

XD AdminXD Admin

M2

M1

Page 27: Develop Powerful Big Data Applications Easily with SpringXD

XD Container

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Shell

Data Transport

ZooKeeperContainer State

XD AdminXD Admin(leader)

M2

M1

Page 28: Develop Powerful Big Data Applications Easily with SpringXD

XD Container

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Shell

Data Transport

ZooKeeperContainer State

XD AdminXD Admin(leader)

XD Container

M2

M1

Page 29: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Shell

Data Transport

ZooKeeperContainer State

XD AdminXD Admin(leader)

XD Container XD Container XD Container

M2

M1

Page 30: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD – Runtime – Fault Tolerance

Spring XD

XD Shell

Data Transport

ZooKeeperContainer State

XD AdminXD Admin(leader)

XD Admin

XD Container XD Container XD Container

M2

M1

Page 31: Develop Powerful Big Data Applications Easily with SpringXD

XD Container

Spring XD – Runtime – Fault Tolerance

Spring XD

XD ShellHTTP POST /streams/aStream “M3| M4”

Data Transport

ZooKeeperContainer State

XD AdminXD Admin(leader)

XD Container

XD Admin

M3

XD Container

M4M2

M1

Page 32: Develop Powerful Big Data Applications Easily with SpringXD

Predictive Models

Spring XD

Page 33: Develop Powerful Big Data Applications Easily with SpringXD

Predictive Models

Page 34: Develop Powerful Big Data Applications Easily with SpringXD

Concepts

• Model

– Parameterized algorithm

• Model Building

– Derive a parameterized algorithm from the data

– Slow process. Done offline, as a batch process, due to

amount of data involved

• Model Scoring

– Use the model to predict new information

– Fast process. Can be done as part of stream processing

Page 35: Develop Powerful Big Data Applications Easily with SpringXD

PMML

• Predictive Model Markup Language

• XML interchange format for analytical models

• From the Data Mining Group http://www.dmg.org

• Processing + models

• Supported by statistics and data minig tools

– R/Rattle, SAS Enterprise Miner, SPSS, Weka

• Java Evaluator API

– JPMML-Evaluator project

– Provides model scoring

Page 36: Develop Powerful Big Data Applications Easily with SpringXD

Demo:

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a

Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spring XD – Predictive

Models

36

Page 37: Develop Powerful Big Data Applications Easily with SpringXD

Spring XD: Unified Platform for Big Data

37

Spring XD Runtime

BIDIRECTIONAL

Compute

HDFS

RDBMS

NoSQL

R, SAS

Streams Jobs

ingest workflow

export

taps

Predictive Modelling

>_

Redis

Page 38: Develop Powerful Big Data Applications Easily with SpringXD

Jobs

Spring XD

CSV to JDBC

FTP to HDFS

JDBC to HDFS

HDFS to JDBC

HDFS to MongoDB

Page 39: Develop Powerful Big Data Applications Easily with SpringXD

Learn More…

Spring XD

• Project: http://projects.spring.io/spring-xd/

• GitHub: https://github.com/spring-projects/spring-xd/

• Issues: https://jira.springsource.org/browse/XD

• Wiki: https://github.com/spring-projects/spring-xd/wiki

• Samples: https://github.com/spring-projects/spring-xd-samples