BIG DATA != HADOOP - Uni...

25
BIG DATA != HADOOP Moving the Cheese for the Industry

Transcript of BIG DATA != HADOOP - Uni...

Page 1: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BIGDATA!=HADOOPMoving the Cheese for the Industry

Page 2: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Agenda

BTW2017- ©dibuco GmbH 2

1 Obeservation aboutHadoopadoptioninindustry

2 BigDatahasmovedon

3 Businesscaseonfindingthecheese

4 Conclusions

Page 3: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

AndreasTönne

BTW2017- ©dibuco GmbH 3

• CTOofdibuco• Specialinterests in• Cloudand BigDataArchitecture• Businessrequirements for the new world• Programming language design

Page 4: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Observation:Viewof BigData

BTW2017- ©dibuco GmbH 4

1stof Its Kind

MassiveBatchProcessing

Impressive!MustHave!

Highly ScalableHDFSStorage

BigDataAdoption

=HadoopAdoption

Page 5: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Observation:Hadoop is the GoldenHammerCompaniesthat have invested inHadoop

• look for Hadoop solutions for every BigDataProblem

• think it is obvious to store BigDatainHDFS• think it is obvious to use MapReduce

BTW2017- ©dibuco GmbH 5

Page 6: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

FindHadoop!

BTW2017- ©dibuco GmbH 6

Page 7: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

What Makes the Industry Cling onto Hadoop?"Hadoopadoptionisdrivenprimarilybytechnologyexecutives,especiallythoseintheC-suite,includingtheCFOorCOO."(Gartner)

Largeinvestment!(Often)Smallreturn• TimeandPersonnel– Missingelsewhere• Skills– RequirementsandalgorithmsforMapReduce?• IT-Operations– Howtooperatethisalien?

WhodarestosackthisinvestmentdrivenbytheC**?

BTW2017- ©dibuco GmbH 7

Page 8: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

StillValid– BigData =Hadoop?

BTW2017- ©dibuco GmbH 8

Data:GartnerHadoop AdoptionStudy2015

54%No

invest

18%Invest in2years

26%Have

Hadoopskills

57%Lackofskills

49%Lookingfor value

inHadoop

70%Hadoop1-20users

4%Hadoopzerousers

28%Hadoopsingleuser

Simplerprogramming

model?

Simplermanagementand backup?

ApprochableUIfor

analysis?

Page 9: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

There are Good Reasons to Say– No!“One of the corevalue propositions of Hadoop is that it is alower costoption to traditionalinformation infrastructure,”Heudecker (Gartner)...

“However,the low numbers of users relativeto the cost of clusterhardware,as well as any software support costs,

may mean Hadoop is failing to liveup to this promise.”

That was2015– What about today?

BTW2017 9

Page 10: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BigDataEvolved-Where Is Your Cheese Now?GartnerHypeCycle2016– Whatgoesmissing?BigData!• UnderstandingofBigDatavaluematured

BTW2017- ©dibuco GmbH 10

Hadoop

SparkFlink,Storm,Hana

Diggingthroughhugebatchesofdata(datalake)

Streaming(IoT),machinelearning(bigin2017),naturallanguage(Watson),...

Page 11: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BigDataEvolved – Batchis only One Use CaseMapReduceisanexcellentfitforembarrassinglyparalleldataprocessing• ....reallifeisnotembarrassinglyparallel• It'sastreamofevents,linkedbyhistory

TimeisacriticalfactorforBigDatavaluegain• Hadoophasahugelatency• Batchesareahistoricextractofthereallifestreamofdata

NB:HDFSstoragecanstillbeanexcellentchoice

BTW2017- ©dibuco GmbH 11

Page 12: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Decision Process For DigitalTransformation

BTW2017- ©dibuco GmbH 12

DigitalDisruptors

NewBusinessGoals

BigDataStrategyRequirements

TechnologyChoice

Solution

Starthere!

Nothere wrong driver of goals

Page 13: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BusinessCase

Page 14: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BusinessCase– Getting Away From Hadoop

• Bestpossible linkage of data by textual contents• Fastavailability of new data• Dealing with "language"changes

BTW2017- ©dibuco GmbH 14

DB

Web

File

Events

LiveUpdates

Page 15: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BusinessCase– Upfront TechnologyDecisions• Hadoopforeverythingglobal• Maintainingidentityconstraints• Keepinggloballanguagestatisticscurrent• Massimportandmaintenance

• SingleNoSQLDBchoice(Titan)• GraphDBmatchedlogicaldatamodelperfectly

• Microservice/Queuingarchitecturefortherest

BTW2017- ©dibuco GmbH 15

Page 16: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

TheTechnologyStarted Biting our Goals• Businessvalueharmedbyhardscalingproblems• SingleDataSwampStorage• StorageModelinefficient• Oopswebuiltamonolith!

• ProblemssolvedbyHadoopbatchinvasion• Timewasrunningout.Literally!

Itwastimeto...

BTW2017- ©dibuco GmbH 16

Concurrency/Distribution

Accuracy oflinkage

Consistencyrequirements

Scaling

Page 17: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

ThinkAgain!

BTW2017- ©dibuco GmbH 17

Page 18: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Rethinking the Solution

BTW2017- ©dibuco GmbH 18

Data Splitbyserviceandusageneeds

Data Consistencyrequirementsreasonable?

Data Distributionofcreationandusage?

Requirements Balanceofrequirementsandscalability

Requirements Findscalablealgorithmicsolutionorbinrequirement

BusinessGoals Whataretherisks?

LegalConsiderations Whatisallowedandaccepted?

DataGovernance E.g.dataownership,IAMsystemvs.scalability

Page 19: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

💡 Insights💡• Wehaveastreamingsituation• Timeisofhighimportance• Idealconsistencyrequirementscanbereduced• Newalgorithmsallowtoreducethedatamodelstorage

BTW2017- ©dibuco GmbH 19

Cost ofstorage

Throughput

Responsetime

Scalability

Page 20: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Solution• Truemicroservice architecturewithpolyglotpersistence• Thebesttechnologyandmodelforeachservice

• Modularstreamingarchitecture• Multiplestreamingtopologiescutbyservice

• Newstreamingandbig-data-optimizedanalysisalgorithms• Alotofad-hoccomputationinsteadofglobal,aginginformation

BTW2017- ©dibuco GmbH 20

Page 21: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Outcome• Betterresultsbyad-hoccomputation👍• Incrementalmaintenanceofaginglinguisticinformation👍• Massivereductionofstoragerequirements(est.upto70-80%)👍• Truehorizontalscalabilitybyservice👍

• CompleteremovalofHadoopbatches👍

BTW2017- ©dibuco GmbH 21

Page 22: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Conclusions• HadoopisseenastherolemodelofBigData(ourobservation)

• InvestmentinHadooponeofthereasonstostickwithHadoop• WeobservethatproblemsarecraftedtobesolvablebyHadoop

• ExpectationsforBigDataevolvedbeyondbatchprocessing

• Allowrethinkingofthebusinessgoalsandsolutionrequirementswithouttechnologyinmind

BTW2017- ©dibuco GmbH 22

Page 23: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

BTW2017- ©dibuco GmbH 23

Page 24: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

THANKYOUFORYOURATTENTION!

Franz-SchubertStraße [email protected]

Page 25: BIG DATA != HADOOP - Uni Stuttgartbtw2017.informatik.uni-stuttgart.de/slidesandpapers/H1-11-32/slides.… · Big Data Evolved – Batch is onlyOne Use Case MapReduce is an excellent

Sources• Cheese theme:"WhoMoved My Cheese?:AnAmazing Wayto Dealwith ChangeinYour WorkandinYour Life"(SpencerJohnson)G.P.Putnam's Sons;1edition (September8,1998)

• Cheese picture,WikimediaCommons (ChristianBauer)

• BigDataLandscape 2016(C)MattTurk,JimHao,FirstMark Capital

• HammerMalene Thyssen,http://commons.wikimedia.org/wiki/User:Malene

• Pocketwatch,WikimediaCommons (No user listed)

BTW2017- ©dibuco GmbH 25