s2gx2015 who needs batch
-
Upload
gunnar-hillert -
Category
Technology
-
view
429 -
download
0
Transcript of s2gx2015 who needs batch
SPRINGONE2GXWASHINGTON, DC
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/
Who Needs Batch?By Michael Minella and Gunnar Hillert
@michaelminella, @ghillert
Michael MinellaTwitter: @michaelminella
Podcast: http://javaOffHeap.com or @OffHeapWebsite: http://spring.io
Gunnar HillertTwitter: @ghillert
Website: http://blog.hillert.comConference: http://devnexus.com
https://github.com/ghillert/s1-2gx2015-who-needs-batch
W E H AV E A NANNOUNCEMENT
WE HAVE A DISEASE
OBJECTIVIUS SHINIUMSYNDROMUS
Shiny Object SyndromSHINY OBJECT
SYNDROME
IT MAKES USINNOVATE
ROAD TO RECOVERY
COOL KIDS ARE DOINGSTREAM PROCESSING
Stream Analytics
Cloud Dataflow
LET’S TAKE ACLOSER LOOK
BATCH AND STREAMDEFINITIONS
LOOK AT COMMONUSE CASES
CONCLUSIONS
DEFINITION:STREAM PROCESSING
A system for processing an unbounded set of data in an asynchronous manor.
DEFINITION:BATCH PROCESSING
DEPENDS ON WHO YOU ASK
“Batch … applications run … as special cases of stream processing applications”
- Apache Flink Documentation
Batch processing is the processing ofa bounded data set without interruption
or interaction
KEYDIFFERENCES
1BOUNDED VSUNBOUNDED
2SYNCHRONOUS VSASYNCHRONOUS
3STATEFUL VSSTATELESS
STATE OFSTREAMING
Stream Analytics
Cloud Dataflow
Stream Analytics
Cloud Dataflow
STATE OFSTREAMING
JSR-352
WHY THE END OF BATCH?
LATENCY
WHAT DOES THE END OF BATCH MEAN?
LIMIT LATENCY
“ENTERPRISE GRADE”
JSR-352
LOOK AT COMMONUSE CASES
E.T.L.
DATABASE TO HDFS
DEMO
STREAMING CAN BE USEFULIN INGESTION USE CASES
TWITTER TO HDFS
DATA SCIENCE
BATCH TRAININGPREDICTIVE MODELS
CONNECTEDCAR
transformerhttp filter hdfs
type-transformer
python
Gemfire
REST
Hadoop
Gemfire
STREAMING APROXIMIATIONS EXIST≈
BREADTH AND ACCURACYDON’T MATCH BATCH≠
WORKFLOWORCHESTRATION
forEachDir
start
getDirectoryInfo
join
dirSubProcss dirSubProcss…
Condition 1
start
getDirAgeAndNumberOfFiles
Ingest
Condition 2
sendReminderEmail
ArchiveEnd
DEMO
HOW IS THIS DONEVIA STREAMING?
NON-INTERACTIVEPROCESSING
SAME AS ETLPROCESSING
PORT SCANNER
DEMO
STREAMING DOESN’T MAKE SENSE
STREAMING DOES MAKE SENSE
WHERE LATENCYIS A PRIORITY
DATA LOSSIS ACCEPTABLE
UNBOUNDEDDATASET
BUT DOES NOTIN OTHER USE CASES
HIGHLYCOMPLEX
ERROR HANDLINGNOT AS ROBUST
DATA GUARANTEESNOT AS ROBUST
WHERE BATCH IS BETTER
FINITEDATASET
ROBUST ERROR HANDLING
RESOURCESARE A CONCERN
NOT EITHER OR
COMBINING THE TWO PROVIDES THE MOST POWER
E.T.L.v2
mailhttp mysql2hdfs
Database to HDFS v2
DEMO
BATCH HAS LASTEDBECAUSE IT MAKSE SENSE
SPRING XD/DATA FLOWBEST OF BOTH WORLDS
ENTERPRISE GRADESTREAM PROCESSING
BEST BATCH OPTIONON THE JVM
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/ 88
Check out Spring Cloud Data Flow on https://spring.io
Apache Spark for Big Data Processing – Salon N-P Up next!High Performance Stream Processing – Salon N-P 8:30AM TomorrowSpring Integration Extensions Ecosystem – Here! 12:45 Tomorrow
Learn More. Stay Connected.
@springcentral Spring.io/video