MASS HDFS: Multi-Agent Spatial Simulation Hadoop...
Transcript of MASS HDFS: Multi-Agent Spatial Simulation Hadoop...
MASS HDFS: Multi-Agent Spatial Simulation Hadoop
Distributed File System Yun-Ming Shih
Capstone Project Term Report I
Master of Science in Computer Science & Software Engineering
University of Washington
06/17/2017
Project Committee: Munehiro Fukuda, Committee Chair Michael Stiber, Committee Member
Johnny Lin, Committee Member
BackgroundAnincreasingamountofdataandprocessingneedsispushingdevelopmentofparallelizedbigdataanalysis.MostapproachesdealwithdatathathasasimplestructurelikeCSVandSQL.Sciencedataforclimateanalysishasacomplexstructure,whichisnotwellsupported.Toexpandtheuseofparallelizedbigdataanalysiswithinscientificrelatedfields,Prof.Fukudaandhisresearchgroupproposedamulti-agentbasedmethodthatcanprocessmulti-dimensionalNetCDFdata.TheydemonstrateditspracticalitybyincorporatingtheUniversityofWashingtonClimateAnalysis(UWCA)webapplication.ItusesNetCDFsoftwarewiththeParallel-ComputingLibraryforMulti-AgentSpatialSimulationinJava-MASSJavaLibrary.
The original version of MASS UWCA, implemented by JasonWoodring,hasonemasterserverthatreadsallthedatafromthestorage and sends them to the slave servers for processing. Theissuewiththisapplicationistheamountoftimespentinreadinglarge files (in this case, 22 GB). At the time, the slow readingperformancewas suspected to be the design of having only themaster server to read and transfer data to slaves intensively.However,theimprovementwasfarfrommeetingtheexpectationaftermanuallyduplicatingdata toeachof theslaveservers.ThissuggeststheissuemaycomefromtheimplementationofreadingdatafromtheserversitselftoPlacesforprocessing.
InAutumnof2015,aformerstudent,MichaelO’Keefe,proposedasolutiontoimprovetheUWCAperformancebyaddingaMASSParallelI/OtotheMASSJavalibrary.ParallelI/OistheMASSJavalayerthatallowsefficientfilereadingandwritingfromeveryslavenodetotheMASSPlaces.Thislayerdoesnothandlefiletransferfrommastertoslaves.Theimplementationmadeopening,reading,writing,andclosingfilespossibleateachslaveserver,withtheassumptionthatfilesexist.AlthoughtheideacomesfromimprovingUWCAreadperformance,Michael’sworkhasonlybeentestedwithMASSJava,andhaven’tbeenintegratedwithUWCA.Myproposedproject,MASSHDFS,focusesonhandlingstoringdataanddatatransfer.ThiswillbedoneusingHadoopDistributedFileSystem(HDFS).Inthefollowingsections,IwilldiscusstheliteraturereviewthatIhavedoneinchoosingthefilesystemandhowIincorporateHDFSwithMASSParallelI/O.
LiteratureReviewInthisphaseoftheproject,IexploredBigDataandHadoopliteraturereviewstohelpmeunderstandthetopicin-depthpriortomyHadoopsetupprocess.
BigDataBigdatacouldbefoundin3forms:Structured,unstructured,andsemi-structuredwhichcancontainbothforms(xmlfile).Thedataformatforstructureddataiswellknowninadvance.Likearelationaldatabase,itcanstoreandaccessanydata,andprocessitintheformoffixedformat.Theissuewithstructureddataisthatthesizegrowsverylarge(zettabyte=1billionterabytes).Unstructuredformofbigdatareferstoanydatawithanunknownform,orthestructureisclassifiedasunstructureddata.Googlesearch,documentprocessing,isanexampleofunstructuredbigdata.Whenthesizeislargeitisdifficulttoderivevalueoutofit.Asdescribed,allformsofbigdatahavetheFour-Vcharacteristics:
• Volume• Variety-heterogeneoussourcesandthenatureofdata,bothstructuredandunstructured• Velocity-speedofdatageneration(dataflowsinfrombusinessprocesses,applicationlogs,
networksandsocialmediasites,sensors,mobiledevices,etc.)• Variability-theinconsistencywhichcanbeshownbythedataattimes
Withbigdata,businessescanutilizeoutsideintelligencewhiletakingdecisions,customerservicehasimprovedovertime,itcanbeusedforearlyidentificationofrisktotheproduct/services,andforbetteroperationalefficiency.However,inthegeospatialdomainlikeclimateanalysis,datawouldusuallyhavehigherintensitystructureslikeNetCDF.Thesedata,withexponentialgrowthofdatarelationships,areproducedfromvarioussensorsthataredistributedovertheenvironmentthatrecordphysicalchanges.Then,dataaretobeaccessedandappliedwithscientificmodelsforsimulatingandpredictingthephenomena.Techniquesfordatastoring,real-timedataaccessing,andhandlingremainchallengingforbigdataanalysisinthisdomain.
Hadoop-HDFSHadoopisaframeworkthatenablesdistributedprocessingoflargedataacrossclustersofcommodityservers.Itiscomposedoffourcorecomponents:HadoopCommon,HDFS,MapReduce,andYARN.HadoopCommonisasetofutilitiesandlibrariesthatcanbeusedinothermodulesofHadooporotherprograms.Forexample,IamusingHadoopCommontoestablishtheconnectionbetweenMASSJavaParallelIOandHDFS.Theothercorecomponentsthatarepreviouslymentionedwillbeintroducedinthefollowingsections.HDFSArchitecture:InthisMASSHDFS,datastoringandtransferringishandledbyHDFS.ItisformedwithNameNode,DataNode,andSecondaryNameNode.ItoperatesonaMaster-Slavearchitecturemodelwithonenamenodeandmultipledatanodes.
o Namenodeismasterofcluster(UW1-320-03)! Storesmetadataandfiledirectory! Metadata
• Filename,Filesize,Numberofblocks,BlockIDs,User,Group,Permission,Replication,Blocksize,etc.
! MetadatastoredinRAManddisk(storesdataindiskincaseifnamenodefails,informationcanberecoveredfromthedisk)
! Namenodedoesn'tstoreactualdata(Datanodedoes)! Namenodeknowsthedatanodesareactiveordownoftheentirecluster
• Datanodessendaheartbeatevery3seconds• Namenodewaitsfor10mintodetermineifthedatanodeisoutofservice
o Datanodesareslaveservers(UW1-320-00,01,02,04,05,06,07)! Dataarestoredasblocks! Blocksizesareusuallyin128MB! Thedatagetsdividedfirst,thengetsstoredtodatanodesbasedonthereplication
factornumber• Lastblocksize<=blocksize
! Whyareblocksreplicated?(SeeBlockReplicaPolicysection)• Reliability
o Ifblock1indatanode1fails,youcanstillgetblock1fromdatanodes2and5.
o Ifdatanode1itselfisdown,thenthereplicainnode2,3,4,and5willmakemorereplicatotheavailablenodessothatthenumberofreplicastillmatchesthereplicationfactor
• DatanodeshavenoknowledgeoffilesinHDFS,theyonlyhaveknowledgeaboutblocks
• Datanodesscanallblocksondisksandgenerateablockreport–blockreporthasablockversionusedforappendingoperationo Blockreportshappenatstartupandperiodically
HDFSReadOperation:• StepstoreadafilefromHDFS:
o Clientmustcallopen()-ThiswillmakeanRPCcalltothenamenodetogettheblockidandlocationsforthefirstfewblocks.! Thereturnslistissortedbynetworkdistance
o Theclientwillthendirectlycontactthedatanodestorequesttransferringthequeryingblock.Ifallreadsfail,theclientwillcontactthenextclosestdatanode.Thesameprocessisrepeateduntilthewholefileistransferredblockbyblock.
o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.
HDFSWriteOperation:• StepstowriteafiletoHDFS:
o Clientmustcallcreate()-ThiswillmakeanRPCcalltothenamenode.
! Namenodeensuresafiledoesnotalreadyexistandcheckstheclient’swritepermission.
! Clientasksthenamenodetoallocatethefileinblocks(128MB).Basedonthereplicationfactor,thenamenodereturnslistissortedbynetworkdistance.
o Theclientwillthendirectlyflushthedatatotheclosestdatanodein4Kpackets.Then,thatdatanodewillforwardthepackettoitsclosestdatanode,andsoon.Eachdatanodesendsanacknowledgemessagetoitsrequester.ThisishowtheloadisdistributedinanHDFScluster.
o Whenthenumberofreplicameetsthereplicationfactor,namenodewillupdatetheblocklocationmemory.ThesameprocedureisrepeateduntilallblocksarestoredinHDFS.Theclientcallsclose()tocompletewritingdatatoHDFS.
o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.
BlockReplicaPolicy:Blockreplicaplacementpolicyisbasedonfactorsofreliability,availability,andnetworkbandwidthutilization.Supposewehavereplicationfactorof3,4racks,and4datanodes:Scenario1:WhendatawrittenfromoutsideworldtoHDFS–copydatatoHDFS:
o Adatanodeischosenrandomlytostorethefirstreplica.o Then,anodefromadifferentrackwillbechosentostorethesecondreplication.o Thethirdreplicawillbestoredinadifferentnodeofthesamerackwherethesecondreplica
is.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.
Scenario2:Whendataiswrittenbysometaskinsidethecluster:o First,replicaisstoredonthedatanodewherethetaskexists.o Second,thethirdarestoredondifferentnodesofasamerack,butadifferentrackfromthe
rackofthefirstreplica.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.
Trade-off:Ifthenumberofreplicaishigh,thenthesystemishighlyreliableandavailable.However,morenetworkbandwidthisutilizedandlesswritingefficiency(writeoperationisexpensivebecausewriteoperationneedsnetworkbandwidth).Ifthenumberofreplicaisless,thenthesystemisnothighlyreliableandavailable,butlessnetworkbandwidthisutilizedwhichwouldgiveabetterwriteperformanceforthesamereason.
WhathappenswhenaDataNodeisOutofService?If,forsomereason,datanode1isdown,itwillnotsendaheartbeattonamenode.Namenodewillwait10minfordatanode1tosenditsheartbeatandthendecideifdatanode1isoutofservice.Fortunately,blocksarestillavailableonothernodes,buttheclusterwillbeunder-replicated.Asaresult,namenodewilldotheschedulesjobtomakemorereplicatootherdatanodes.Then,thenewdatanodewillsendablockreporttothenamenodeandthenamenodewillupdateitsblocklocationmapping.
HadoopYarn:YARNisacompletelyrewrittenarchitectureofHadoopcluster.Itoffersclearadvantagesinscalability,efficiency,andflexibilitycomparedtotheclassicalMapReduceengineinthefirstversionofHadoop(MRv1).
Limitations:MRv1limitationsrelatedtoscalability,resourceutilization,andthesupportofworkloadsdifferentfromnewMapReduce.Jobexecutioncontrolledbytwotypesofprocesses:
• SinglemasterprocesscalledJobTracker-coordinatesalljobsrunningontheclusterandassignsmapandreducetaskstorunontheTaskTrackers.
• NumberofsubordinateprocessescalledTaskTrackers-runassignedtasksandperiodicallyreporttheprogresstotheJobTracker.
Issues:1. ScalabilitybottleneckiscausedbyhavingasingleJobTracker.Limitsarereachedwithacluster
of5000nodesand40000tasksrunningconcurrently.2. NeithersmallnorlargeHadoopclustershadusedtheircomputationalresourceswith
optimumefficiency.Theclusteradministratordividesthecomputationalresourcesoneachslavenodeintoafixednumberofmap/reduceslots.Evenwhenthereducetasksarenotrunning,nodescanonlyrunanumberofmaptasksuptothenumberofavailablemapslots,andviceversa.
3. HadoopwasdesignedtorunMapReducejobs.Thisincreasestheneedtosupportotherdataprocessingframeworksthatcouldrunonthesameclustertoshareresourcesinanefficientandfairmanner.
Addressingthescalabilityissue:JobTrackerresponsiblefor
1. ClusterresourcemanagementManagingcomputationalresourcesintheclusterinvolves:maintainingthelistoflivenodes,listofavailableandoccupiedmapandreduceslots,andallocatingavailableslotstoappropriatejobsandtasksaccordingtoselectedschedulingpolicy.
2. TaskcoordinationCoordinatingalltasksrunningonaclusterinvolves:instructingTaskTrackerstostartmapandreducetasks,monitoringtheexecutionofthetasks,restartingfailedtasks,speculativelyrunningslowtasks,calculatingtotalvaluesofjobcounters,andmore.
JobTrackerconstantlykeepstrackofthousandsofTaskTrackers,hundredsofjobs,andtensofthousandsofmap-and-reducetasks.Ontheotherhand,TaskTrackersusuallyrunonlyadozentasks.OnesolutionistoreducetheresponsibilitiesofthesingleJobTrackeranddelegatesomeofthemtotheTaskTrackerssincetherearemanyoftheminacluster.ThisisdonebyseparatingdualresponsibilitiesoftheJobTracker(clusterresourcemanagementandtaskcoordination)intotwodistincttypesofprocesses.YARNintroducesaclustermanagerthatisonlyresponsiblefortrackinglivenodesandtheavailableresourcesintheclusterandassigningthemtothetasks.Foreachjobsubmittedtothecluster,aTraskTrackerstartsadedicatedandshort-livedJobTrackertocontroltheexecutionofthetaskswithinthejob.Doingso,coordinationofajob'slifecycleisspreadacrossalltheavailablemachinesinthecluster.Morejobscanruninparallelandmorenodes/taskscanbedone,whichincreasesscalability.
Namechanges:• ResourceManagerinsteadofaclustermanager• ApplicationMasterinsteadofadedicatedandshort-livedJobTracker• NodeManagerinsteadofTaskTracker• AdistributedapplicationinsteadofaMapReducejob
ThisresearchwasrequiredtodeterminewhetherYARNcanbebeneficialtotheMASSHDFS.YARNisarewrittenarchitectureofHadoopcluster,bothsmallandlargeHadoopclustersgreatlybenefitfromit.ItssuitableforprogramslikeMapReducethatneedsdynamicresourceutilizationontheHadoopframework.MASSHDSFdoesnotuseYARN.InMapReduce,tasksgetsenttowheredataresidesforprocessing.However,MASSJavaprocessdoesnotdecidewhereitshouldgotoperformthetask,butinstead,whichdatatoretrieveforthetasktoperform.EachagentretrievesthedataandreadsthemintoPlacetoprocess.
HadoopSetupPhase: Date Worklog
Text File
4/9 • InstallHadoopandset-up4/10 • Developmentenvironmentset-up
4/11–4/13 • RunMichael’sParalleltestIO,wrotescriptsfordevelopmentuse
• SuccessfullyrunningMASSwith1node(noHDFS)4/14–4/15 • StuckonMASSinitandHadoop
• Addsecondarynamenode(UW1-320-09)resolveHadoopissue
• AlteropenTextFilemethodinPlaceclass4/17–4/18 • Set-updistributedenvironment
• Generateauthenticationkey• HavetroublerunningMASSonremote• Debugissuesandbindingissue• “Hadoopclassnotfound”issue
4/19–4/20 • RewriteprogramduetoMichael’scodecleanedup• TestedrewrittenMASSHDFS,failedonconnection
refused• BindissuealsooccurredwhenrunningParallelIO
4/23 – 4/24 • ReformatHadoopandtestHDFSoperations• TestMASSHDFS-failedonconnectionrefused• ReformatHadoopandtestHDFSoperations
4/25 • ReformatHadoopandstopcalling./sbin/stop-dfs.sh• Michael’scodeiscausingissuessostartonaseparate
project4/26 • ReformatHadoop
• CreatenewmavenHDFSclient;Can’trunonremoteduetomanifest.txt
4/27 • RecreatemavenHDFSclient;Can’trunonremoteduetomanifest.txt
• Createanon-mavenprojectMassHDFS.Adddependenciesmanually
• Issue:MassHDFSnotfindingfileinHDFS4/28 • WithProf.Fukuda,successfullysetupConfiguration
inMassHDFS• Issue:Can’tfindfilesbecausehdfshomeissettolocal
directory5/1 • RunMassHDFSusingHadoopcommandworks
• ModifyandrunMASSHDFSusingHadoopcommandalsoworks
NetCDF File
5/2 • StartonNetCDF• Issue:FailedonreadingNetCDF1000fromHDFS• Debug
5/4 – 5/18 • Debug• PullMichael’snewchanges• Issue:Log4jclassnotfoundissue• Issue:OutOfMemory–heapsizeissue• Issue:openForReadusingNetCDFAPI
5/22 • ReformatHDFStousernodeswith8replicationfactor
• SetJavaheapsizefromHadoop-envfile5/23 – 5/24 • Reformatforheapsizechange
• Issue:afterchangingheapsizeinHadoop-env,stilldoesn’twork.
• TestNetCDFonParallelIOwithoutMASSHDFS5/25 • CheckwithMichael–NetCDFworkswithhisbranch
ofcodesoproblemisfrommergingourcode• Createanewbranchandrewrite–everythingworks
5/26 • Smallbugfix• Cleanup• Test(pass)withNetCDF50,NetCDF100,andTxtfile
Evaluation
6/2 – 6/5 • Issue:Create10Gand50GdummytextfileonUWmachineandrunoutofmemory
• Issue:tryingwithMASS2nodesbutgetting~./ssh/id_rsaissue
• Issue:gettingauthenticationissues6/7 • Createmass_java_appl(MASSapplication)tomake
suretheissueisn’tfromMASS• Issue:stuckonschoolmachinesnotworking
6/10 – 6/16 • InstallHadoopondslabinsteadofshihy4–failedbecausefilesizetoolargeandcan’tlogbackin
• Issue:schoolmachinesconnectionissue–can’tloginfromhome
• Termreport• CreateWritefunction–notworking
HadoopInstallationHDFSusesmaster-slavearchitecturetoenableautomaticdatadistribution,andIcombineParallelIOwithHDFS,whichIcalledMASSHDFS,tohandlefilestoringandtransferring.Ideally,thenumberofMASSnodesshouldbethenumberofHDFSnodes.Inhdfs-site.xml,replicationfactorshouldbesettheequaltothenumberofHDFSnodessothateverynodeintheclusterpossessestheentirefile.Thisway,usingthesamenodesforMASSandHDFSreducesnetworkdelaysinceeveryMASSnodehasacopyofthefile.
Iamusinguw1-320-03forthemasternode.Server00,01,02,03,04,05,06,07aresetupasslavenodes,anduw1-320-09isthesecondarynamenode.
InHadoop-env.sh,Isettheheapsizeto20GtoavoidtheOutOfMemoryissuewhenusingMASSforNetCDFfiles.
TextFileTheoriginalTxtFileclassinParallelIOusesthefilesysteminterfacetoopenandreadfiles.IenabledHDFSfile-readusingtheHadoopclientcodeintheParallelIOTxtFile.TheintegratedParallelIO(MASSHDFS)candirectlyreadtextfilefromHDFStoMASSPlaceproperly.Thiscodeistestedusing1MASSnodeaswellas4and8HDFSnodes.
NetCDFFileInsteadofreadingthefileusingfilesysteminterface,NetCDFFileusestheNetCDFAPItoreadthefileintothelocalmachinethenreadsitintoMASSPlace.ToenableHDFSfile-read,IusedtheHDFScopyToLocalmethodtotransfertherequestedfilefromHDFStolocal.Then,ParallelIOreadsthefilefromlocaltoMASSPlaceasintheTxtFileclass.Thecodeistestedusing1MASSnodeaswellas4and8HDFSnodes.
AnotheroptionIhadwastochangetheNetCDFAPIimplementation.However,Prof.FukudaandIinspectedtheopensourcecodeanddecidedtoleaveitasafuturepossibleprojectforthetimebeing.
IssuesReformattingHadoopmultipletimesHDFShadtobereformattedformultiplereasons:UWserverconnectionsissues,reformatHadooptotestusingdifferentnumberofnodes,changingHadoopJavaheapsize,andmovingHadoopfrommypersonalschoolaccounttodslab.IencounteredseveralHadoopissueswithconnectionrefusaland
bindin.Atfirst,Ithoughttheissuewascausedbymyimplementation,butitturnedoutthatitisbecauseoftheschoolservers’instabilityandfrequentlycalling“./sbin/start-dfs.sh”and“./sbin/stop-dfs.sh”.Whenoneoftheserversgetsrebooted,itclearstheHDFSconfigurationinthetmpdirectory,whichrequiresreformatting.
ConnectionissuesThisissuecausedthemaindelayinmyproject.WhileworkinginTxtFile,IcouldnotgetMASStorunwithmultiplenodesbecauseofconnectionissues.ThisissuecausedaweekofdelayforbothMichaelandI.WesuspecttheissueeithercamefromMASSortheinstabilityofU-Drive,soweswitchedtodevelopmentusingonlyoneMASSnode.Now,IamattheendtheHadoopphasewherebothTxtFileandNetCDFFileworkwithoneMASSnodeandeightHadoopnodes.However,IamcurrentlystuckongettingittorunwithmultipleMASSnodesduetoauthenticationerror.Thisiscausingahugedelayformyevaluation.AlthoughIhavefollowedtheinstructionsandgeneratedtheauthenticationkeysmultipletimes,MASSstillcan’trunwithmultiplenodesonmypersonalaccount.AftermeetingupwithProf.Fukuda,itseemslikeotherMASSapplicationsarerunningcorrectlywithmultiplenodesusingthedslabaccount,Iwillbeswitchingtothedslabfortheevaluation.
DevelopmentIssuesMichaelandIwereworkinginparallelthroughoutAprilandMay.BecausemycodeextendsfromMichael’scode,Ihadtorewritemycodeseveraltimesbecauseofhischangesindesign,cleanup,andfixes.AfterMichaelfinishedhiswork,IranintoabugthatstoppedmefromtestingNetCDFforawhile.Thisissuecausedmeanotherweekofdelay.Icouldn’tfindwhattheproblemwas,butitwasworkingafterIcreatedanewbranchfromMichael’sdevelopmentandrewroteeverything.Thissuggeststheproblemmaycomefromnotresolvingamergeconflictcorrectly.
AnotherissueIhadwasnotbeingabletoconnecttotheHDFSclustercorrectly.Becauseofthisproblem,IwroteaseparateHDFSClientprogramandfoundthattheconnectionfailswhenrunningtheprogramusingJavacommand.HereisanexampleofmyseparateHDFSClientprogramanditsusage:
Afterresearchingovertheinternet,IlearnedthatthiscouldbeabugintheHadoopclientcodeasmanypeoplereportedfollowingtheHadoopinstructionsandwereabletoaddtheconfigurationsaswell.However,theHDFShomedirectorywasstillsettothelocalhomedirectory.Therefore,anyHDFScommandperformedresultsinfailuresincetheHDFSpathdoesn’texistonthelocalsystem.Tosolvethisissue,wedecidedtousetheHadoopcommand(./bin/hadoopjar<jarfile><args>)torunMASSJavainsteadof“java–jar”.
NextStepAsImentioned,IamtryingtotransfertheHadoopsetupfrommypersonalaccounttothedslabaccount.IfIcanrunMASSwithmultiplenodesusingthedslab,thenIwillconductmyperformanceevaluationover1,4,and8nodes.Otherwise,IwillhavetodiscusstheissuewiththeresearchgroupandfindoutwheretheissueresidesinMASS.Aftertheevaluation,Iwillstartmynextdevelopmentphase,SystemIntegration,tointegrateMASSHDFSwiththeUWClimateAnalysiswebapplication.