science fiction, technology fact science fiction, technology fact
Big Data Applications Made Easy: Fact Or Fiction?
-
Upload
glenn-renfro -
Category
Data & Analytics
-
view
819 -
download
2
description
Transcript of Big Data Applications Made Easy: Fact Or Fiction?
Pivotal Confidential–Internal Use OnlyPivotal Confidential–Internal Use Only
Spring XD
Glenn Renfrogrenfro @pivotal.io@CPPWFS
Pivotal Confidential–Internal Use Only
Volume
Variety
Velocity
Veracity
420 Million Wearables
90% of enterprise data is
unstructured
60-100 sensors in each car
22 Billion sensors by 2020
86% suspect data
inaccuracy
30% revenue loss due to bad
data quality
500 million tweets each day
2.3 Trillion GBs of each day
Data
Data Points: McKinsey, Twitter, Gartner, IBM
Pivotal Confidential–Internal Use Only
Batch and Streaming
often handled by
multiple platforms
Fragmented Big Data
Ecosystem
Not all data Hadoop
bound
“One stop shop for
developing and deploying
Big Data Applications”
SPRING XDEXTREME DATA
Pivotal Confidential–Internal Use Only
Batch and Streaming
often handled by
multiple platforms
Fragmented Big Data
Ecosystem
Not all data Hadoop
bound
Portable on-prem, YARN, EC2, PCF, Mesos,
Docker etc.
Easy to Use, Extend and Integrate with other
Technologies
Built on proven Spring EAI and Batch projects
(Volume, Velocity, Veracity, and Variety)
Unified Stream and Batch Operations
Hadoop Batch Workflow Orchestration
Predictive Analytics and Model Scoring
Spring XD to Rescue
Pivotal Confidential–Internal Use Only
Jobs, Steps,
Readers, Writers
Ingestion, Export,
Orchestration, Hadoop
Controllers, REST,
WebSocket
Channels, Adapters,
Filters, Transformers
WEBINTEGRATION BATCH BIG DATA
SPRING CORE
FRAMEWORK SECURITY GROOVY REACTOR
DATA
RELATIONAL
DATA ACCESS
NON-RELATIONAL
DATA ACCESS
BOOT
Bootable, Minimal, Ops-Ready
GRAILSFull-stack, Web
XDStream, Taps,
Jobs
IO EXECUTION
IO FOUNDATION
IO COORDINATIONSPRING CLOUD
Pivotal Confidential–Internal Use Only
Spring XD - 10,000 Foot View
Pivotal Confidential–Internal Use Only
Streams
HTTPTailFileMail
TwitterGemfireSyslog
TCPUDPJMS
RabbitMQMQTTTrigger
Reactor TCP/UDP
FilterTransformer
Object-to-JSONJSON-to-Tuple
SplitterAggregatorHTTP Client
JPMML EvaluatorShell
GroovyPython
Java
FileHDFSJDBCTCPLogMail
RabbitMQGemfireSplunkMQTT
Dynamic RouterCounters
Pivotal Confidential–Internal Use Only
Create a stream with http as a source and hdfs
as a sink. The hdfs —rollover is set to a small
value so that we can read the file on hdfs.
Pivotal Confidential–Internal Use Only
Spring XD - Distributed Runtime
XD Container XD Container
XD Admin(leader)
XD ShellHTTP POST /streams/aStream “M1 | M2”
Message Bus
ZooKeeper
Container StateXD AdminXD Admin
Spring App Context
M1 M2
Pivotal Confidential–Internal Use Only
Pivotal Confidential–Internal Use Only
Pivotal Confidential–Internal Use Only
Spring XD - Analytics
• Counters and Gauges
• Simple & Field Value Counter (how many tweets for #java)
• Aggregate Counter (how many
tweets for #java in the week/day/hr)
• Gauge & Rich Gauge (how many
requests / minute?)
• Abstract API implemented in Redis
in-memory
• Predictive Model Evaluation
• JPMML
• Is this transaction fraudulent?
• What group does this user belong to?
• Interoperable with R, Rattle,
KNIME, RapidMiner, MADLib
Pivotal Confidential–Internal Use Only
Jobs
CSV to JDBC
FTP to HDFS
JDBC to HDFS
HDFS to JDBC
HDFS to MongoDB
Pivotal Confidential–Internal Use Only
REALTIMEVIEWS
BATCHVIEWS
Spring XD
MASTERDATASET
Spring BOOT
Spring BOOT
Spring BOOT
FILES
StreamProcessing
Analytics
Ingest
WorkflowOrchestration
Spring
XD
Export
XD>GemFire XD
PredictiveModeling
GemFire XD
SPEED
LAYER
BATCH
LAYER
SERVING
LAYER
PCF - BOSH Service PCF - Apps
MOBILE
SENSORS
SOCIAL
Pivotal Confidential–Internal Use Only
Unified runtime
for both Real-
time and Batch
use cases
Scalable,
Distributed and
Fault Tolerant
Runtime
Increased
Productivity through
out-of-the-box
components
Closed Loop
Analytics through
online (stream) and
offline (batch) data
Swiss-army knife of data
movement and data
pipelines
Repeatable ‘turnkey’
solution for next generation
data-centric use cases
Pivotal Confidential–Internal Use Only
Agility: Easy to Setup and Run
Writing HTTP Data
to HDFS
…that simple!
or
or
or
Pivotal Confidential–Internal Use Only
Spring XD on YARN
Spring XD Running
on
YARN!
Copies Files to
HDFSCreates
manifest.yml
Spring Boot App
‘xd-yarn start admin’
Spring Boot App
‘xd-yarn start container’
Spring Boot App
Pivotal Confidential–Internal Use Only
Even easier with PCF
Pivotal Confidential–Internal Use Only
Natural Fit: Reactive Streaming Pipelines
Moving Average
‘collect values every 500ms’
Non-Blocking
Backpressure
“take all these items I have whether you can
handle them or not”
“give me the next N available items”
OLD
NEWMicrobatching
‘either 1024b or 350ms; trigger downstream processing’
Pivotal Confidential–Internal Use Only
Deployment Manifest – Module Count
• http | doWork | hdfs
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
stream deploy –name s1
--properties
module.http.count=2,
module.doWork.count=4,
module.hdfs.count=3
Pivotal Confidential–Internal Use Only
Deployment Manifest – Module Placement
• http | doWork | hdfs
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
stream deploy –name s1
--properties
module.http.count=2,
module.doWork.count=4,
module.hdfs.count=3,
module.http.criteria =
groups.contains(‘WEB’)
WEB
Pivotal Confidential–Internal Use Only
Deployment Manifest – Data Partitioning
• http | doWork | hdfs
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
stream deploy –name s1
--properties
...
module.http.producer
.partitionKeyExpression =
payload.customerId
WEB
doWork modules will always
process the same set of customer
IDs
Pivotal Confidential–Internal Use Only
Learn More
• Project: http://projects.spring.io/spring-xd/
• GitHub: https://github.com/spring-projects/spring-xd/
• Wiki: https://github.com/spring-projects/spring-xd/wiki
• Samples: https://github.com/spring-projects/spring-xd-samples
Pivotal Confidential–Internal Use Only
A NEW PLATFORM FOR A NEW ERA