Develop Powerful Big Data Applications Easily with SpringXD
Transcript of Develop Powerful Big Data Applications Easily with SpringXD
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Develop Powerful Big Data Applications
Easily with SpringXDMark Fisher & Mark Pollack
Mark Fisher
• Spring XD – Co Lead
• Spring Integration
• Spring Framework
• Spring AMQP
Mark Pollack
• Spring XD – Co Lead
• Spring Data
• Spring Framework
• Spring .NET
2
Speakers
Spring XD
XD = eXtreme Data
3
“One stop shop for
developing and deploying
Big Data Applications”
4
What is a Big Data Application?
Big Data Architecture
StreamProcessing
Analytics
Ingest
WorkflowOrchestration
Spring
XD
Export
FILES
SOCIAL
SENSORS
MOBILE
XD>
MASTERDATASET
PredictiveModeling BATCH
VIEWS
REALTIMEVIEWS
Spring BOOT
Spring BOOT
Spring BOOT
REALTIMEVIEWS
BATCHVIEWS
Spring XD
MASTERDATASET
Spring BOOT
Spring BOOT
Spring BOOT
FILES
SOCIAL
SENSORS
MOBILE
StreamProcessing
Analytics
Ingest
WorkflowOrchestration
Spring
XD
Export
XD>
PredictiveModeling
Lambda ArchitectureSPEED
LAYER
BATCH
LAYER
SERVING
LAYER
REALTIMEVIEWS
BATCHVIEWS
Spring XD
MASTERDATASET
Spring BOOT
Spring BOOT
Spring BOOT
FILES
SOCIAL
SENSORS
MOBILE
StreamProcessing
Analytics
Ingest
WorkflowOrchestration
Spring
XD
Export
XD> GemFire XD
PredictiveModeling
GemFire XD
SPEED
LAYER
BATCH
LAYER
SERVING
LAYER
Spring IO Platform
9
Jobs, Steps,
Readers, Writers
Ingestion, Export,
Orchestration, Hadoop
Controllers, REST,
WebSocket
Channels, Adapters,
Filters, Transformers
WEBINTEGRATION BATCH BIG DATA
SPRING CORE
FRAMEWORK SECURITY GROOVY REACTOR
DATA
RELATIONAL
DATA ACCESS
NON-RELATIONAL
DATA ACCESS
BOOT
Bootable, Minimal, Ops-Ready
GRAILSFull-stack,
Web
XDStream, Taps,
Jobs
IO EXECUTION
IO FOUNDATION
IO COORDINATIONSPRING CLOUD
Spring XD: Unified Platform for Big Data
10
Spring XD Runtime
BIDIRECTIONAL
Compute
HDFS
RDBMS
NoSQL
R, SAS
Streams Jobs
ingest workflow
export
taps
Predictive Modelling
>_
Redis
Streams
Spring XD
HTTPTailFileMail
TwitterGemfireSyslog
TCPUDPJMS
RabbitMQMQTTTrigger
Reactor TCP/UDP
FilterTransformer
Object-to-JSONJSON-to-Tuple
SplitterAggregatorHTTP Client
Groovy ScriptsJava Code
JPMML Evaluator
FileHDFSJDBCTCPLogMail
RabbitMQGemfireSplunkMQTT
Dynamic RouterCounters
Demo:
Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/
Spring XD - Streams
12
Taps
Spring XD
• “Listen” to data on another stream
Analytics
• Counters and Gauges
– Simple & Field Value Counter
• How many tweets for #java
– Aggregate Counter
• How many tweets for #java in the week/day/hour
– Gauge & Rich Gauge
• How many requests per minute?
• Abstract API. Implemented in
– In-Memory
– Redis
Demo:
Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/
Spring XD - Taps
15
Spring XD Runtime
XD Container XD Container
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
ZooKeeper
Container StateXD AdminXD Admin
Spring XD Runtime
XD Container XD Container
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
Spring App Context
M1
ZooKeeper
Container StateXD AdminXD Admin
Spring XD Runtime
XD Container XD Container
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
Spring App Context
M1
ZooKeeper
Container StateXD AdminXD Admin
M2
Deployment Manifest
Spring XD
Deployment Manifest
Spring XD
• The stream/job definition defines the logical view of processing
• The deployment manifest defines the physical view of processing
• Important properties relate to module count and data partitioning
xd:>stream create test1 --definition
"http | transform --expression=payload.toUpperCase() | log”
xd:>stream deploy --name test1 --properties "module.transform.count=3"
Deployment Manifest – Data Partitioning
Spring XD
stream create words --definition "http |
splitter --expression=payload.split(' ') | log"
stream deploy words --properties
module.splitter.producer.partitionKeyExpression=payload,module.log.count=2
http post --data
"How much wood would a woodchuck chuck if a woodchuck could chuck wood"
Demo:
Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/
Spring XD - Partitioning
22
Distributed, Fault Tolerant Runtime
Spring XD
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Container XD Container
Spring XD
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
Spring App Context
M1
ZooKeeper
Container StateXD AdminXD Admin
M2
XD Container
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
ZooKeeperContainer State
XD AdminXD Admin
M2
XD Container
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Data Transport
ZooKeeperContainer State
XD AdminXD Admin
M2
M1
XD Container
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Shell
Data Transport
ZooKeeperContainer State
XD AdminXD Admin(leader)
M2
M1
XD Container
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Shell
Data Transport
ZooKeeperContainer State
XD AdminXD Admin(leader)
XD Container
M2
M1
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Shell
Data Transport
ZooKeeperContainer State
XD AdminXD Admin(leader)
XD Container XD Container XD Container
M2
M1
Spring XD – Runtime – Fault Tolerance
Spring XD
XD Shell
Data Transport
ZooKeeperContainer State
XD AdminXD Admin(leader)
XD Admin
XD Container XD Container XD Container
M2
M1
XD Container
Spring XD – Runtime – Fault Tolerance
Spring XD
XD ShellHTTP POST /streams/aStream “M3| M4”
Data Transport
ZooKeeperContainer State
XD AdminXD Admin(leader)
XD Container
XD Admin
M3
XD Container
M4M2
M1
Predictive Models
Spring XD
Predictive Models
Concepts
• Model
– Parameterized algorithm
• Model Building
– Derive a parameterized algorithm from the data
– Slow process. Done offline, as a batch process, due to
amount of data involved
• Model Scoring
– Use the model to predict new information
– Fast process. Can be done as part of stream processing
PMML
• Predictive Model Markup Language
• XML interchange format for analytical models
• From the Data Mining Group http://www.dmg.org
• Processing + models
• Supported by statistics and data minig tools
– R/Rattle, SAS Enterprise Miner, SPSS, Weka
• Java Evaluator API
– JPMML-Evaluator project
– Provides model scoring
Demo:
Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/
Spring XD – Predictive
Models
36
Spring XD: Unified Platform for Big Data
37
Spring XD Runtime
BIDIRECTIONAL
Compute
HDFS
RDBMS
NoSQL
R, SAS
Streams Jobs
ingest workflow
export
taps
Predictive Modelling
>_
Redis
Jobs
Spring XD
CSV to JDBC
FTP to HDFS
JDBC to HDFS
HDFS to JDBC
HDFS to MongoDB
Learn More…
Spring XD
• Project: http://projects.spring.io/spring-xd/
• GitHub: https://github.com/spring-projects/spring-xd/
• Issues: https://jira.springsource.org/browse/XD
• Wiki: https://github.com/spring-projects/spring-xd/wiki
• Samples: https://github.com/spring-projects/spring-xd-samples