s2gx2015 who needs batch

88
SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Who Needs Batch? By Michael Minella and Gunnar Hillert @michaelminella, @ghillert

Transcript of s2gx2015 who needs batch

Page 1: s2gx2015 who needs batch

SPRINGONE2GXWASHINGTON, DC

Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/

Who Needs Batch?By Michael Minella and Gunnar Hillert

@michaelminella, @ghillert

Page 2: s2gx2015 who needs batch
Page 3: s2gx2015 who needs batch

Michael MinellaTwitter: @michaelminella

Podcast: http://javaOffHeap.com or @OffHeapWebsite: http://spring.io

Gunnar HillertTwitter: @ghillert

Website: http://blog.hillert.comConference: http://devnexus.com

Page 4: s2gx2015 who needs batch
Page 5: s2gx2015 who needs batch

https://github.com/ghillert/s1-2gx2015-who-needs-batch

Page 6: s2gx2015 who needs batch

W E H AV E A NANNOUNCEMENT

Page 7: s2gx2015 who needs batch

WE HAVE A DISEASE

Page 8: s2gx2015 who needs batch

OBJECTIVIUS SHINIUMSYNDROMUS

Page 9: s2gx2015 who needs batch

Shiny Object SyndromSHINY OBJECT

SYNDROME

Page 10: s2gx2015 who needs batch

IT MAKES USINNOVATE

Page 11: s2gx2015 who needs batch

ROAD TO RECOVERY

Page 12: s2gx2015 who needs batch

COOL KIDS ARE DOINGSTREAM PROCESSING

Page 13: s2gx2015 who needs batch

Stream Analytics

Cloud Dataflow

Page 14: s2gx2015 who needs batch
Page 15: s2gx2015 who needs batch
Page 16: s2gx2015 who needs batch

LET’S TAKE ACLOSER LOOK

Page 17: s2gx2015 who needs batch
Page 18: s2gx2015 who needs batch

BATCH AND STREAMDEFINITIONS

Page 19: s2gx2015 who needs batch

LOOK AT COMMONUSE CASES

Page 20: s2gx2015 who needs batch

CONCLUSIONS

Page 21: s2gx2015 who needs batch

DEFINITION:STREAM PROCESSING

Page 22: s2gx2015 who needs batch

A system for processing an unbounded set of data in an asynchronous manor.

Page 23: s2gx2015 who needs batch

DEFINITION:BATCH PROCESSING

Page 24: s2gx2015 who needs batch

DEPENDS ON WHO YOU ASK

Page 25: s2gx2015 who needs batch

“Batch … applications run … as special cases of stream processing applications”

- Apache Flink Documentation

Page 26: s2gx2015 who needs batch

Batch processing is the processing ofa bounded data set without interruption

or interaction

Page 27: s2gx2015 who needs batch

KEYDIFFERENCES

Page 28: s2gx2015 who needs batch

1BOUNDED VSUNBOUNDED

Page 29: s2gx2015 who needs batch

2SYNCHRONOUS VSASYNCHRONOUS

Page 30: s2gx2015 who needs batch

3STATEFUL VSSTATELESS

Page 31: s2gx2015 who needs batch

STATE OFSTREAMING

Page 32: s2gx2015 who needs batch

Stream Analytics

Cloud Dataflow

Page 33: s2gx2015 who needs batch

Stream Analytics

Cloud Dataflow

Page 34: s2gx2015 who needs batch

STATE OFSTREAMING

Page 35: s2gx2015 who needs batch

JSR-352

Page 36: s2gx2015 who needs batch

WHY THE END OF BATCH?

Page 37: s2gx2015 who needs batch

LATENCY

Page 38: s2gx2015 who needs batch

WHAT DOES THE END OF BATCH MEAN?

Page 39: s2gx2015 who needs batch

LIMIT LATENCY

Page 40: s2gx2015 who needs batch

“ENTERPRISE GRADE”

Page 41: s2gx2015 who needs batch

JSR-352

Page 42: s2gx2015 who needs batch

LOOK AT COMMONUSE CASES

Page 43: s2gx2015 who needs batch

E.T.L.

Page 44: s2gx2015 who needs batch

DATABASE TO HDFS

Page 45: s2gx2015 who needs batch

DEMO

Page 46: s2gx2015 who needs batch

STREAMING CAN BE USEFULIN INGESTION USE CASES

Page 47: s2gx2015 who needs batch

TWITTER TO HDFS

Page 48: s2gx2015 who needs batch

DATA SCIENCE

Page 49: s2gx2015 who needs batch

BATCH TRAININGPREDICTIVE MODELS

Page 50: s2gx2015 who needs batch

CONNECTEDCAR

Page 51: s2gx2015 who needs batch

transformerhttp filter hdfs

type-transformer

python

Gemfire

REST

Hadoop

Gemfire

Page 52: s2gx2015 who needs batch

STREAMING APROXIMIATIONS EXIST≈

Page 53: s2gx2015 who needs batch

BREADTH AND ACCURACYDON’T MATCH BATCH≠

Page 54: s2gx2015 who needs batch

WORKFLOWORCHESTRATION

Page 55: s2gx2015 who needs batch
Page 56: s2gx2015 who needs batch

forEachDir

start

getDirectoryInfo

join

dirSubProcss dirSubProcss…

Page 57: s2gx2015 who needs batch

Condition 1

start

getDirAgeAndNumberOfFiles

Ingest

Condition 2

sendReminderEmail

ArchiveEnd

Page 58: s2gx2015 who needs batch

DEMO

Page 59: s2gx2015 who needs batch

HOW IS THIS DONEVIA STREAMING?

Page 60: s2gx2015 who needs batch

NON-INTERACTIVEPROCESSING

Page 61: s2gx2015 who needs batch

SAME AS ETLPROCESSING

Page 62: s2gx2015 who needs batch

PORT SCANNER

Page 63: s2gx2015 who needs batch

DEMO

Page 64: s2gx2015 who needs batch

STREAMING DOESN’T MAKE SENSE

Page 65: s2gx2015 who needs batch

STREAMING DOES MAKE SENSE

Page 66: s2gx2015 who needs batch

WHERE LATENCYIS A PRIORITY

Page 67: s2gx2015 who needs batch

DATA LOSSIS ACCEPTABLE

Page 68: s2gx2015 who needs batch

UNBOUNDEDDATASET

Page 69: s2gx2015 who needs batch

BUT DOES NOTIN OTHER USE CASES

Page 70: s2gx2015 who needs batch

HIGHLYCOMPLEX

Page 71: s2gx2015 who needs batch

ERROR HANDLINGNOT AS ROBUST

Page 72: s2gx2015 who needs batch

DATA GUARANTEESNOT AS ROBUST

Page 73: s2gx2015 who needs batch

WHERE BATCH IS BETTER

Page 74: s2gx2015 who needs batch

FINITEDATASET

Page 75: s2gx2015 who needs batch

ROBUST ERROR HANDLING

Page 76: s2gx2015 who needs batch

RESOURCESARE A CONCERN

Page 77: s2gx2015 who needs batch

NOT EITHER OR

Page 78: s2gx2015 who needs batch

COMBINING THE TWO PROVIDES THE MOST POWER

Page 79: s2gx2015 who needs batch

E.T.L.v2

Page 80: s2gx2015 who needs batch

mailhttp mysql2hdfs

Database to HDFS v2

Page 81: s2gx2015 who needs batch

DEMO

Page 82: s2gx2015 who needs batch
Page 83: s2gx2015 who needs batch

BATCH HAS LASTEDBECAUSE IT MAKSE SENSE

Page 84: s2gx2015 who needs batch

SPRING XD/DATA FLOWBEST OF BOTH WORLDS

Page 85: s2gx2015 who needs batch

ENTERPRISE GRADESTREAM PROCESSING

Page 86: s2gx2015 who needs batch

BEST BATCH OPTIONON THE JVM

Page 87: s2gx2015 who needs batch
Page 88: s2gx2015 who needs batch

Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/l icenses/by-nc/3.0/ 88

Check out Spring Cloud Data Flow on https://spring.io

Apache Spark for Big Data Processing – Salon N-P Up next!High Performance Stream Processing – Salon N-P 8:30AM TomorrowSpring Integration Extensions Ecosystem – Here! 12:45 Tomorrow

Learn More. Stay Connected.

@springcentral Spring.io/video