s2gx2015 who needs batch

Post on 25-Jan-2017

429 views 0 download

Transcript of s2gx2015 who needs batch

SPRINGONE2GXWASHINGTON, DC

Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/

Who Needs Batch?By Michael Minella and Gunnar Hillert

@michaelminella, @ghillert

Michael MinellaTwitter: @michaelminella

Podcast: http://javaOffHeap.com or @OffHeapWebsite: http://spring.io

Gunnar HillertTwitter: @ghillert

Website: http://blog.hillert.comConference: http://devnexus.com

https://github.com/ghillert/s1-2gx2015-who-needs-batch

W E H AV E A NANNOUNCEMENT

WE HAVE A DISEASE

OBJECTIVIUS SHINIUMSYNDROMUS

Shiny Object SyndromSHINY OBJECT

SYNDROME

IT MAKES USINNOVATE

ROAD TO RECOVERY

COOL KIDS ARE DOINGSTREAM PROCESSING

Stream Analytics

Cloud Dataflow

LET’S TAKE ACLOSER LOOK

BATCH AND STREAMDEFINITIONS

LOOK AT COMMONUSE CASES

CONCLUSIONS

DEFINITION:STREAM PROCESSING

A system for processing an unbounded set of data in an asynchronous manor.

DEFINITION:BATCH PROCESSING

DEPENDS ON WHO YOU ASK

“Batch … applications run … as special cases of stream processing applications”

- Apache Flink Documentation

Batch processing is the processing ofa bounded data set without interruption

or interaction

KEYDIFFERENCES

1BOUNDED VSUNBOUNDED

2SYNCHRONOUS VSASYNCHRONOUS

3STATEFUL VSSTATELESS

STATE OFSTREAMING

Stream Analytics

Cloud Dataflow

Stream Analytics

Cloud Dataflow

STATE OFSTREAMING

JSR-352

WHY THE END OF BATCH?

LATENCY

WHAT DOES THE END OF BATCH MEAN?

LIMIT LATENCY

“ENTERPRISE GRADE”

JSR-352

LOOK AT COMMONUSE CASES

E.T.L.

DATABASE TO HDFS

DEMO

STREAMING CAN BE USEFULIN INGESTION USE CASES

TWITTER TO HDFS

DATA SCIENCE

BATCH TRAININGPREDICTIVE MODELS

CONNECTEDCAR

transformerhttp filter hdfs

type-transformer

python

Gemfire

REST

Hadoop

Gemfire

STREAMING APROXIMIATIONS EXIST≈

BREADTH AND ACCURACYDON’T MATCH BATCH≠

WORKFLOWORCHESTRATION

forEachDir

start

getDirectoryInfo

join

dirSubProcss dirSubProcss…

Condition 1

start

getDirAgeAndNumberOfFiles

Ingest

Condition 2

sendReminderEmail

ArchiveEnd

DEMO

HOW IS THIS DONEVIA STREAMING?

NON-INTERACTIVEPROCESSING

SAME AS ETLPROCESSING

PORT SCANNER

DEMO

STREAMING DOESN’T MAKE SENSE

STREAMING DOES MAKE SENSE

WHERE LATENCYIS A PRIORITY

DATA LOSSIS ACCEPTABLE

UNBOUNDEDDATASET

BUT DOES NOTIN OTHER USE CASES

HIGHLYCOMPLEX

ERROR HANDLINGNOT AS ROBUST

DATA GUARANTEESNOT AS ROBUST

WHERE BATCH IS BETTER

FINITEDATASET

ROBUST ERROR HANDLING

RESOURCESARE A CONCERN

NOT EITHER OR

COMBINING THE TWO PROVIDES THE MOST POWER

E.T.L.v2

mailhttp mysql2hdfs

Database to HDFS v2

DEMO

BATCH HAS LASTEDBECAUSE IT MAKSE SENSE

SPRING XD/DATA FLOWBEST OF BOTH WORLDS

ENTERPRISE GRADESTREAM PROCESSING

BEST BATCH OPTIONON THE JVM

Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/ 88

Check out Spring Cloud Data Flow on https://spring.io

Apache Spark for Big Data Processing – Salon N-P Up next!High Performance Stream Processing – Salon N-P 8:30AM TomorrowSpring Integration Extensions Ecosystem – Here! 12:45 Tomorrow

Learn More. Stay Connected.

@springcentral Spring.io/video