BIG DATA != HADOOP - Uni...

Post on 14-Jun-2020

1 views 0 download

Transcript of BIG DATA != HADOOP - Uni...

BIGDATA!=HADOOPMoving the Cheese for the Industry

Agenda

BTW2017- ©dibuco GmbH 2

1 Obeservation aboutHadoopadoptioninindustry

2 BigDatahasmovedon

3 Businesscaseonfindingthecheese

4 Conclusions

AndreasTönne

BTW2017- ©dibuco GmbH 3

• CTOofdibuco• Specialinterests in• Cloudand BigDataArchitecture• Businessrequirements for the new world• Programming language design

Observation:Viewof BigData

BTW2017- ©dibuco GmbH 4

1stof Its Kind

MassiveBatchProcessing

Impressive!MustHave!

Highly ScalableHDFSStorage

BigDataAdoption

=HadoopAdoption

Observation:Hadoop is the GoldenHammerCompaniesthat have invested inHadoop

• look for Hadoop solutions for every BigDataProblem

• think it is obvious to store BigDatainHDFS• think it is obvious to use MapReduce

BTW2017- ©dibuco GmbH 5

FindHadoop!

BTW2017- ©dibuco GmbH 6

What Makes the Industry Cling onto Hadoop?"Hadoopadoptionisdrivenprimarilybytechnologyexecutives,especiallythoseintheC-suite,includingtheCFOorCOO."(Gartner)

Largeinvestment!(Often)Smallreturn• TimeandPersonnel– Missingelsewhere• Skills– RequirementsandalgorithmsforMapReduce?• IT-Operations– Howtooperatethisalien?

WhodarestosackthisinvestmentdrivenbytheC**?

BTW2017- ©dibuco GmbH 7

StillValid– BigData =Hadoop?

BTW2017- ©dibuco GmbH 8

Data:GartnerHadoop AdoptionStudy2015

54%No

invest

18%Invest in2years

26%Have

Hadoopskills

57%Lackofskills

49%Lookingfor value

inHadoop

70%Hadoop1-20users

4%Hadoopzerousers

28%Hadoopsingleuser

Simplerprogramming

model?

Simplermanagementand backup?

ApprochableUIfor

analysis?

There are Good Reasons to Say– No!“One of the corevalue propositions of Hadoop is that it is alower costoption to traditionalinformation infrastructure,”Heudecker (Gartner)...

“However,the low numbers of users relativeto the cost of clusterhardware,as well as any software support costs,

may mean Hadoop is failing to liveup to this promise.”

That was2015– What about today?

BTW2017 9

BigDataEvolved-Where Is Your Cheese Now?GartnerHypeCycle2016– Whatgoesmissing?BigData!• UnderstandingofBigDatavaluematured

BTW2017- ©dibuco GmbH 10

Hadoop

SparkFlink,Storm,Hana

Diggingthroughhugebatchesofdata(datalake)

Streaming(IoT),machinelearning(bigin2017),naturallanguage(Watson),...

BigDataEvolved – Batchis only One Use CaseMapReduceisanexcellentfitforembarrassinglyparalleldataprocessing• ....reallifeisnotembarrassinglyparallel• It'sastreamofevents,linkedbyhistory

TimeisacriticalfactorforBigDatavaluegain• Hadoophasahugelatency• Batchesareahistoricextractofthereallifestreamofdata

NB:HDFSstoragecanstillbeanexcellentchoice

BTW2017- ©dibuco GmbH 11

Decision Process For DigitalTransformation

BTW2017- ©dibuco GmbH 12

DigitalDisruptors

NewBusinessGoals

BigDataStrategyRequirements

TechnologyChoice

Solution

Starthere!

Nothere wrong driver of goals

BusinessCase

BusinessCase– Getting Away From Hadoop

• Bestpossible linkage of data by textual contents• Fastavailability of new data• Dealing with "language"changes

BTW2017- ©dibuco GmbH 14

DB

Web

File

Events

LiveUpdates

BusinessCase– Upfront TechnologyDecisions• Hadoopforeverythingglobal• Maintainingidentityconstraints• Keepinggloballanguagestatisticscurrent• Massimportandmaintenance

• SingleNoSQLDBchoice(Titan)• GraphDBmatchedlogicaldatamodelperfectly

• Microservice/Queuingarchitecturefortherest

BTW2017- ©dibuco GmbH 15

TheTechnologyStarted Biting our Goals• Businessvalueharmedbyhardscalingproblems• SingleDataSwampStorage• StorageModelinefficient• Oopswebuiltamonolith!

• ProblemssolvedbyHadoopbatchinvasion• Timewasrunningout.Literally!

Itwastimeto...

BTW2017- ©dibuco GmbH 16

Concurrency/Distribution

Accuracy oflinkage

Consistencyrequirements

Scaling

ThinkAgain!

BTW2017- ©dibuco GmbH 17

Rethinking the Solution

BTW2017- ©dibuco GmbH 18

Data Splitbyserviceandusageneeds

Data Consistencyrequirementsreasonable?

Data Distributionofcreationandusage?

Requirements Balanceofrequirementsandscalability

Requirements Findscalablealgorithmicsolutionorbinrequirement

BusinessGoals Whataretherisks?

LegalConsiderations Whatisallowedandaccepted?

DataGovernance E.g.dataownership,IAMsystemvs.scalability

💡 Insights💡• Wehaveastreamingsituation• Timeisofhighimportance• Idealconsistencyrequirementscanbereduced• Newalgorithmsallowtoreducethedatamodelstorage

BTW2017- ©dibuco GmbH 19

Cost ofstorage

Throughput

Responsetime

Scalability

Solution• Truemicroservice architecturewithpolyglotpersistence• Thebesttechnologyandmodelforeachservice

• Modularstreamingarchitecture• Multiplestreamingtopologiescutbyservice

• Newstreamingandbig-data-optimizedanalysisalgorithms• Alotofad-hoccomputationinsteadofglobal,aginginformation

BTW2017- ©dibuco GmbH 20

Outcome• Betterresultsbyad-hoccomputation👍• Incrementalmaintenanceofaginglinguisticinformation👍• Massivereductionofstoragerequirements(est.upto70-80%)👍• Truehorizontalscalabilitybyservice👍

• CompleteremovalofHadoopbatches👍

BTW2017- ©dibuco GmbH 21

Conclusions• HadoopisseenastherolemodelofBigData(ourobservation)

• InvestmentinHadooponeofthereasonstostickwithHadoop• WeobservethatproblemsarecraftedtobesolvablebyHadoop

• ExpectationsforBigDataevolvedbeyondbatchprocessing

• Allowrethinkingofthebusinessgoalsandsolutionrequirementswithouttechnologyinmind

BTW2017- ©dibuco GmbH 22

BTW2017- ©dibuco GmbH 23

THANKYOUFORYOURATTENTION!

Franz-SchubertStraße 7570195Stuttgart+4971169947560info@dibuco.dewww.dibuco.de

Sources• Cheese theme:"WhoMoved My Cheese?:AnAmazing Wayto Dealwith ChangeinYour WorkandinYour Life"(SpencerJohnson)G.P.Putnam's Sons;1edition (September8,1998)

• Cheese picture,WikimediaCommons (ChristianBauer)

• BigDataLandscape 2016(C)MattTurk,JimHao,FirstMark Capital

• HammerMalene Thyssen,http://commons.wikimedia.org/wiki/User:Malene

• Pocketwatch,WikimediaCommons (No user listed)

BTW2017- ©dibuco GmbH 25