MASS HDFS: Multi-Agent Spatial Simulation Hadoop...

14
MASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System Yun-Ming Shih Capstone Project Term Report I Master of Science in Computer Science & Software Engineering University of Washington 06/17/2017 Project Committee: Munehiro Fukuda, Committee Chair Michael Stiber, Committee Member Johnny Lin, Committee Member

Transcript of MASS HDFS: Multi-Agent Spatial Simulation Hadoop...

Page 1: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

MASS HDFS: Multi-Agent Spatial Simulation Hadoop

Distributed File System Yun-Ming Shih

Capstone Project Term Report I

Master of Science in Computer Science & Software Engineering

University of Washington

06/17/2017

Project Committee: Munehiro Fukuda, Committee Chair Michael Stiber, Committee Member

Johnny Lin, Committee Member

Page 2: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

BackgroundAnincreasingamountofdataandprocessingneedsispushingdevelopmentofparallelizedbigdataanalysis.MostapproachesdealwithdatathathasasimplestructurelikeCSVandSQL.Sciencedataforclimateanalysishasacomplexstructure,whichisnotwellsupported.Toexpandtheuseofparallelizedbigdataanalysiswithinscientificrelatedfields,Prof.Fukudaandhisresearchgroupproposedamulti-agentbasedmethodthatcanprocessmulti-dimensionalNetCDFdata.TheydemonstrateditspracticalitybyincorporatingtheUniversityofWashingtonClimateAnalysis(UWCA)webapplication.ItusesNetCDFsoftwarewiththeParallel-ComputingLibraryforMulti-AgentSpatialSimulationinJava-MASSJavaLibrary.

The original version of MASS UWCA, implemented by JasonWoodring,hasonemasterserverthatreadsallthedatafromthestorage and sends them to the slave servers for processing. Theissuewiththisapplicationistheamountoftimespentinreadinglarge files (in this case, 22 GB). At the time, the slow readingperformancewas suspected to be the design of having only themaster server to read and transfer data to slaves intensively.However,theimprovementwasfarfrommeetingtheexpectationaftermanuallyduplicatingdata toeachof theslaveservers.ThissuggeststheissuemaycomefromtheimplementationofreadingdatafromtheserversitselftoPlacesforprocessing.

InAutumnof2015,aformerstudent,MichaelO’Keefe,proposedasolutiontoimprovetheUWCAperformancebyaddingaMASSParallelI/OtotheMASSJavalibrary.ParallelI/OistheMASSJavalayerthatallowsefficientfilereadingandwritingfromeveryslavenodetotheMASSPlaces.Thislayerdoesnothandlefiletransferfrommastertoslaves.Theimplementationmadeopening,reading,writing,andclosingfilespossibleateachslaveserver,withtheassumptionthatfilesexist.AlthoughtheideacomesfromimprovingUWCAreadperformance,Michael’sworkhasonlybeentestedwithMASSJava,andhaven’tbeenintegratedwithUWCA.Myproposedproject,MASSHDFS,focusesonhandlingstoringdataanddatatransfer.ThiswillbedoneusingHadoopDistributedFileSystem(HDFS).Inthefollowingsections,IwilldiscusstheliteraturereviewthatIhavedoneinchoosingthefilesystemandhowIincorporateHDFSwithMASSParallelI/O.

Page 3: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

LiteratureReviewInthisphaseoftheproject,IexploredBigDataandHadoopliteraturereviewstohelpmeunderstandthetopicin-depthpriortomyHadoopsetupprocess.

BigDataBigdatacouldbefoundin3forms:Structured,unstructured,andsemi-structuredwhichcancontainbothforms(xmlfile).Thedataformatforstructureddataiswellknowninadvance.Likearelationaldatabase,itcanstoreandaccessanydata,andprocessitintheformoffixedformat.Theissuewithstructureddataisthatthesizegrowsverylarge(zettabyte=1billionterabytes).Unstructuredformofbigdatareferstoanydatawithanunknownform,orthestructureisclassifiedasunstructureddata.Googlesearch,documentprocessing,isanexampleofunstructuredbigdata.Whenthesizeislargeitisdifficulttoderivevalueoutofit.Asdescribed,allformsofbigdatahavetheFour-Vcharacteristics:

• Volume• Variety-heterogeneoussourcesandthenatureofdata,bothstructuredandunstructured• Velocity-speedofdatageneration(dataflowsinfrombusinessprocesses,applicationlogs,

networksandsocialmediasites,sensors,mobiledevices,etc.)• Variability-theinconsistencywhichcanbeshownbythedataattimes

Withbigdata,businessescanutilizeoutsideintelligencewhiletakingdecisions,customerservicehasimprovedovertime,itcanbeusedforearlyidentificationofrisktotheproduct/services,andforbetteroperationalefficiency.However,inthegeospatialdomainlikeclimateanalysis,datawouldusuallyhavehigherintensitystructureslikeNetCDF.Thesedata,withexponentialgrowthofdatarelationships,areproducedfromvarioussensorsthataredistributedovertheenvironmentthatrecordphysicalchanges.Then,dataaretobeaccessedandappliedwithscientificmodelsforsimulatingandpredictingthephenomena.Techniquesfordatastoring,real-timedataaccessing,andhandlingremainchallengingforbigdataanalysisinthisdomain.

Hadoop-HDFSHadoopisaframeworkthatenablesdistributedprocessingoflargedataacrossclustersofcommodityservers.Itiscomposedoffourcorecomponents:HadoopCommon,HDFS,MapReduce,andYARN.HadoopCommonisasetofutilitiesandlibrariesthatcanbeusedinothermodulesofHadooporotherprograms.Forexample,IamusingHadoopCommontoestablishtheconnectionbetweenMASSJavaParallelIOandHDFS.Theothercorecomponentsthatarepreviouslymentionedwillbeintroducedinthefollowingsections.HDFSArchitecture:InthisMASSHDFS,datastoringandtransferringishandledbyHDFS.ItisformedwithNameNode,DataNode,andSecondaryNameNode.ItoperatesonaMaster-Slavearchitecturemodelwithonenamenodeandmultipledatanodes.

o Namenodeismasterofcluster(UW1-320-03)! Storesmetadataandfiledirectory! Metadata

Page 4: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

• Filename,Filesize,Numberofblocks,BlockIDs,User,Group,Permission,Replication,Blocksize,etc.

! MetadatastoredinRAManddisk(storesdataindiskincaseifnamenodefails,informationcanberecoveredfromthedisk)

! Namenodedoesn'tstoreactualdata(Datanodedoes)! Namenodeknowsthedatanodesareactiveordownoftheentirecluster

• Datanodessendaheartbeatevery3seconds• Namenodewaitsfor10mintodetermineifthedatanodeisoutofservice

o Datanodesareslaveservers(UW1-320-00,01,02,04,05,06,07)! Dataarestoredasblocks! Blocksizesareusuallyin128MB! Thedatagetsdividedfirst,thengetsstoredtodatanodesbasedonthereplication

factornumber• Lastblocksize<=blocksize

! Whyareblocksreplicated?(SeeBlockReplicaPolicysection)• Reliability

o Ifblock1indatanode1fails,youcanstillgetblock1fromdatanodes2and5.

o Ifdatanode1itselfisdown,thenthereplicainnode2,3,4,and5willmakemorereplicatotheavailablenodessothatthenumberofreplicastillmatchesthereplicationfactor

• DatanodeshavenoknowledgeoffilesinHDFS,theyonlyhaveknowledgeaboutblocks

• Datanodesscanallblocksondisksandgenerateablockreport–blockreporthasablockversionusedforappendingoperationo Blockreportshappenatstartupandperiodically

HDFSReadOperation:• StepstoreadafilefromHDFS:

o Clientmustcallopen()-ThiswillmakeanRPCcalltothenamenodetogettheblockidandlocationsforthefirstfewblocks.! Thereturnslistissortedbynetworkdistance

o Theclientwillthendirectlycontactthedatanodestorequesttransferringthequeryingblock.Ifallreadsfail,theclientwillcontactthenextclosestdatanode.Thesameprocessisrepeateduntilthewholefileistransferredblockbyblock.

o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.

HDFSWriteOperation:• StepstowriteafiletoHDFS:

o Clientmustcallcreate()-ThiswillmakeanRPCcalltothenamenode.

Page 5: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

! Namenodeensuresafiledoesnotalreadyexistandcheckstheclient’swritepermission.

! Clientasksthenamenodetoallocatethefileinblocks(128MB).Basedonthereplicationfactor,thenamenodereturnslistissortedbynetworkdistance.

o Theclientwillthendirectlyflushthedatatotheclosestdatanodein4Kpackets.Then,thatdatanodewillforwardthepackettoitsclosestdatanode,andsoon.Eachdatanodesendsanacknowledgemessagetoitsrequester.ThisishowtheloadisdistributedinanHDFScluster.

o Whenthenumberofreplicameetsthereplicationfactor,namenodewillupdatetheblocklocationmemory.ThesameprocedureisrepeateduntilallblocksarestoredinHDFS.Theclientcallsclose()tocompletewritingdatatoHDFS.

o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.

BlockReplicaPolicy:Blockreplicaplacementpolicyisbasedonfactorsofreliability,availability,andnetworkbandwidthutilization.Supposewehavereplicationfactorof3,4racks,and4datanodes:Scenario1:WhendatawrittenfromoutsideworldtoHDFS–copydatatoHDFS:

o Adatanodeischosenrandomlytostorethefirstreplica.o Then,anodefromadifferentrackwillbechosentostorethesecondreplication.o Thethirdreplicawillbestoredinadifferentnodeofthesamerackwherethesecondreplica

is.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.

Scenario2:Whendataiswrittenbysometaskinsidethecluster:o First,replicaisstoredonthedatanodewherethetaskexists.o Second,thethirdarestoredondifferentnodesofasamerack,butadifferentrackfromthe

rackofthefirstreplica.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.

Trade-off:Ifthenumberofreplicaishigh,thenthesystemishighlyreliableandavailable.However,morenetworkbandwidthisutilizedandlesswritingefficiency(writeoperationisexpensivebecausewriteoperationneedsnetworkbandwidth).Ifthenumberofreplicaisless,thenthesystemisnothighlyreliableandavailable,butlessnetworkbandwidthisutilizedwhichwouldgiveabetterwriteperformanceforthesamereason.

WhathappenswhenaDataNodeisOutofService?If,forsomereason,datanode1isdown,itwillnotsendaheartbeattonamenode.Namenodewillwait10minfordatanode1tosenditsheartbeatandthendecideifdatanode1isoutofservice.Fortunately,blocksarestillavailableonothernodes,buttheclusterwillbeunder-replicated.Asaresult,namenodewilldotheschedulesjobtomakemorereplicatootherdatanodes.Then,thenewdatanodewillsendablockreporttothenamenodeandthenamenodewillupdateitsblocklocationmapping.

Page 6: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

HadoopYarn:YARNisacompletelyrewrittenarchitectureofHadoopcluster.Itoffersclearadvantagesinscalability,efficiency,andflexibilitycomparedtotheclassicalMapReduceengineinthefirstversionofHadoop(MRv1).

Limitations:MRv1limitationsrelatedtoscalability,resourceutilization,andthesupportofworkloadsdifferentfromnewMapReduce.Jobexecutioncontrolledbytwotypesofprocesses:

• SinglemasterprocesscalledJobTracker-coordinatesalljobsrunningontheclusterandassignsmapandreducetaskstorunontheTaskTrackers.

• NumberofsubordinateprocessescalledTaskTrackers-runassignedtasksandperiodicallyreporttheprogresstotheJobTracker.

Issues:1. ScalabilitybottleneckiscausedbyhavingasingleJobTracker.Limitsarereachedwithacluster

of5000nodesand40000tasksrunningconcurrently.2. NeithersmallnorlargeHadoopclustershadusedtheircomputationalresourceswith

optimumefficiency.Theclusteradministratordividesthecomputationalresourcesoneachslavenodeintoafixednumberofmap/reduceslots.Evenwhenthereducetasksarenotrunning,nodescanonlyrunanumberofmaptasksuptothenumberofavailablemapslots,andviceversa.

3. HadoopwasdesignedtorunMapReducejobs.Thisincreasestheneedtosupportotherdataprocessingframeworksthatcouldrunonthesameclustertoshareresourcesinanefficientandfairmanner.

Addressingthescalabilityissue:JobTrackerresponsiblefor

1. ClusterresourcemanagementManagingcomputationalresourcesintheclusterinvolves:maintainingthelistoflivenodes,listofavailableandoccupiedmapandreduceslots,andallocatingavailableslotstoappropriatejobsandtasksaccordingtoselectedschedulingpolicy.

2. TaskcoordinationCoordinatingalltasksrunningonaclusterinvolves:instructingTaskTrackerstostartmapandreducetasks,monitoringtheexecutionofthetasks,restartingfailedtasks,speculativelyrunningslowtasks,calculatingtotalvaluesofjobcounters,andmore.

Page 7: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

JobTrackerconstantlykeepstrackofthousandsofTaskTrackers,hundredsofjobs,andtensofthousandsofmap-and-reducetasks.Ontheotherhand,TaskTrackersusuallyrunonlyadozentasks.OnesolutionistoreducetheresponsibilitiesofthesingleJobTrackeranddelegatesomeofthemtotheTaskTrackerssincetherearemanyoftheminacluster.ThisisdonebyseparatingdualresponsibilitiesoftheJobTracker(clusterresourcemanagementandtaskcoordination)intotwodistincttypesofprocesses.YARNintroducesaclustermanagerthatisonlyresponsiblefortrackinglivenodesandtheavailableresourcesintheclusterandassigningthemtothetasks.Foreachjobsubmittedtothecluster,aTraskTrackerstartsadedicatedandshort-livedJobTrackertocontroltheexecutionofthetaskswithinthejob.Doingso,coordinationofajob'slifecycleisspreadacrossalltheavailablemachinesinthecluster.Morejobscanruninparallelandmorenodes/taskscanbedone,whichincreasesscalability.

Namechanges:• ResourceManagerinsteadofaclustermanager• ApplicationMasterinsteadofadedicatedandshort-livedJobTracker• NodeManagerinsteadofTaskTracker• AdistributedapplicationinsteadofaMapReducejob

ThisresearchwasrequiredtodeterminewhetherYARNcanbebeneficialtotheMASSHDFS.YARNisarewrittenarchitectureofHadoopcluster,bothsmallandlargeHadoopclustersgreatlybenefitfromit.ItssuitableforprogramslikeMapReducethatneedsdynamicresourceutilizationontheHadoopframework.MASSHDSFdoesnotuseYARN.InMapReduce,tasksgetsenttowheredataresidesforprocessing.However,MASSJavaprocessdoesnotdecidewhereitshouldgotoperformthetask,butinstead,whichdatatoretrieveforthetasktoperform.EachagentretrievesthedataandreadsthemintoPlacetoprocess.

HadoopSetupPhase: Date Worklog

Text File

4/9 • InstallHadoopandset-up4/10 • Developmentenvironmentset-up

4/11–4/13 • RunMichael’sParalleltestIO,wrotescriptsfordevelopmentuse

• SuccessfullyrunningMASSwith1node(noHDFS)4/14–4/15 • StuckonMASSinitandHadoop

• Addsecondarynamenode(UW1-320-09)resolveHadoopissue

• AlteropenTextFilemethodinPlaceclass4/17–4/18 • Set-updistributedenvironment

• Generateauthenticationkey• HavetroublerunningMASSonremote• Debugissuesandbindingissue• “Hadoopclassnotfound”issue

Page 8: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

4/19–4/20 • RewriteprogramduetoMichael’scodecleanedup• TestedrewrittenMASSHDFS,failedonconnection

refused• BindissuealsooccurredwhenrunningParallelIO

4/23 – 4/24 • ReformatHadoopandtestHDFSoperations• TestMASSHDFS-failedonconnectionrefused• ReformatHadoopandtestHDFSoperations

4/25 • ReformatHadoopandstopcalling./sbin/stop-dfs.sh• Michael’scodeiscausingissuessostartonaseparate

project4/26 • ReformatHadoop

• CreatenewmavenHDFSclient;Can’trunonremoteduetomanifest.txt

4/27 • RecreatemavenHDFSclient;Can’trunonremoteduetomanifest.txt

• Createanon-mavenprojectMassHDFS.Adddependenciesmanually

• Issue:MassHDFSnotfindingfileinHDFS4/28 • WithProf.Fukuda,successfullysetupConfiguration

inMassHDFS• Issue:Can’tfindfilesbecausehdfshomeissettolocal

directory5/1 • RunMassHDFSusingHadoopcommandworks

• ModifyandrunMASSHDFSusingHadoopcommandalsoworks

NetCDF File

5/2 • StartonNetCDF• Issue:FailedonreadingNetCDF1000fromHDFS• Debug

5/4 – 5/18 • Debug• PullMichael’snewchanges• Issue:Log4jclassnotfoundissue• Issue:OutOfMemory–heapsizeissue• Issue:openForReadusingNetCDFAPI

5/22 • ReformatHDFStousernodeswith8replicationfactor

• SetJavaheapsizefromHadoop-envfile5/23 – 5/24 • Reformatforheapsizechange

• Issue:afterchangingheapsizeinHadoop-env,stilldoesn’twork.

• TestNetCDFonParallelIOwithoutMASSHDFS5/25 • CheckwithMichael–NetCDFworkswithhisbranch

ofcodesoproblemisfrommergingourcode• Createanewbranchandrewrite–everythingworks

5/26 • Smallbugfix• Cleanup• Test(pass)withNetCDF50,NetCDF100,andTxtfile

Page 9: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

Evaluation

6/2 – 6/5 • Issue:Create10Gand50GdummytextfileonUWmachineandrunoutofmemory

• Issue:tryingwithMASS2nodesbutgetting~./ssh/id_rsaissue

• Issue:gettingauthenticationissues6/7 • Createmass_java_appl(MASSapplication)tomake

suretheissueisn’tfromMASS• Issue:stuckonschoolmachinesnotworking

6/10 – 6/16 • InstallHadoopondslabinsteadofshihy4–failedbecausefilesizetoolargeandcan’tlogbackin

• Issue:schoolmachinesconnectionissue–can’tloginfromhome

• Termreport• CreateWritefunction–notworking

HadoopInstallationHDFSusesmaster-slavearchitecturetoenableautomaticdatadistribution,andIcombineParallelIOwithHDFS,whichIcalledMASSHDFS,tohandlefilestoringandtransferring.Ideally,thenumberofMASSnodesshouldbethenumberofHDFSnodes.Inhdfs-site.xml,replicationfactorshouldbesettheequaltothenumberofHDFSnodessothateverynodeintheclusterpossessestheentirefile.Thisway,usingthesamenodesforMASSandHDFSreducesnetworkdelaysinceeveryMASSnodehasacopyofthefile.

Iamusinguw1-320-03forthemasternode.Server00,01,02,03,04,05,06,07aresetupasslavenodes,anduw1-320-09isthesecondarynamenode.

InHadoop-env.sh,Isettheheapsizeto20GtoavoidtheOutOfMemoryissuewhenusingMASSforNetCDFfiles.

Page 10: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

TextFileTheoriginalTxtFileclassinParallelIOusesthefilesysteminterfacetoopenandreadfiles.IenabledHDFSfile-readusingtheHadoopclientcodeintheParallelIOTxtFile.TheintegratedParallelIO(MASSHDFS)candirectlyreadtextfilefromHDFStoMASSPlaceproperly.Thiscodeistestedusing1MASSnodeaswellas4and8HDFSnodes.

Page 11: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

NetCDFFileInsteadofreadingthefileusingfilesysteminterface,NetCDFFileusestheNetCDFAPItoreadthefileintothelocalmachinethenreadsitintoMASSPlace.ToenableHDFSfile-read,IusedtheHDFScopyToLocalmethodtotransfertherequestedfilefromHDFStolocal.Then,ParallelIOreadsthefilefromlocaltoMASSPlaceasintheTxtFileclass.Thecodeistestedusing1MASSnodeaswellas4and8HDFSnodes.

AnotheroptionIhadwastochangetheNetCDFAPIimplementation.However,Prof.FukudaandIinspectedtheopensourcecodeanddecidedtoleaveitasafuturepossibleprojectforthetimebeing.

IssuesReformattingHadoopmultipletimesHDFShadtobereformattedformultiplereasons:UWserverconnectionsissues,reformatHadooptotestusingdifferentnumberofnodes,changingHadoopJavaheapsize,andmovingHadoopfrommypersonalschoolaccounttodslab.IencounteredseveralHadoopissueswithconnectionrefusaland

Page 12: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

bindin.Atfirst,Ithoughttheissuewascausedbymyimplementation,butitturnedoutthatitisbecauseoftheschoolservers’instabilityandfrequentlycalling“./sbin/start-dfs.sh”and“./sbin/stop-dfs.sh”.Whenoneoftheserversgetsrebooted,itclearstheHDFSconfigurationinthetmpdirectory,whichrequiresreformatting.

ConnectionissuesThisissuecausedthemaindelayinmyproject.WhileworkinginTxtFile,IcouldnotgetMASStorunwithmultiplenodesbecauseofconnectionissues.ThisissuecausedaweekofdelayforbothMichaelandI.WesuspecttheissueeithercamefromMASSortheinstabilityofU-Drive,soweswitchedtodevelopmentusingonlyoneMASSnode.Now,IamattheendtheHadoopphasewherebothTxtFileandNetCDFFileworkwithoneMASSnodeandeightHadoopnodes.However,IamcurrentlystuckongettingittorunwithmultipleMASSnodesduetoauthenticationerror.Thisiscausingahugedelayformyevaluation.AlthoughIhavefollowedtheinstructionsandgeneratedtheauthenticationkeysmultipletimes,MASSstillcan’trunwithmultiplenodesonmypersonalaccount.AftermeetingupwithProf.Fukuda,itseemslikeotherMASSapplicationsarerunningcorrectlywithmultiplenodesusingthedslabaccount,Iwillbeswitchingtothedslabfortheevaluation.

DevelopmentIssuesMichaelandIwereworkinginparallelthroughoutAprilandMay.BecausemycodeextendsfromMichael’scode,Ihadtorewritemycodeseveraltimesbecauseofhischangesindesign,cleanup,andfixes.AfterMichaelfinishedhiswork,IranintoabugthatstoppedmefromtestingNetCDFforawhile.Thisissuecausedmeanotherweekofdelay.Icouldn’tfindwhattheproblemwas,butitwasworkingafterIcreatedanewbranchfromMichael’sdevelopmentandrewroteeverything.Thissuggeststheproblemmaycomefromnotresolvingamergeconflictcorrectly.

AnotherissueIhadwasnotbeingabletoconnecttotheHDFSclustercorrectly.Becauseofthisproblem,IwroteaseparateHDFSClientprogramandfoundthattheconnectionfailswhenrunningtheprogramusingJavacommand.HereisanexampleofmyseparateHDFSClientprogramanditsusage:

Page 13: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

Afterresearchingovertheinternet,IlearnedthatthiscouldbeabugintheHadoopclientcodeasmanypeoplereportedfollowingtheHadoopinstructionsandwereabletoaddtheconfigurationsaswell.However,theHDFShomedirectorywasstillsettothelocalhomedirectory.Therefore,anyHDFScommandperformedresultsinfailuresincetheHDFSpathdoesn’texistonthelocalsystem.Tosolvethisissue,wedecidedtousetheHadoopcommand(./bin/hadoopjar<jarfile><args>)torunMASSJavainsteadof“java–jar”.

Page 14: MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

NextStepAsImentioned,IamtryingtotransfertheHadoopsetupfrommypersonalaccounttothedslabaccount.IfIcanrunMASSwithmultiplenodesusingthedslab,thenIwillconductmyperformanceevaluationover1,4,and8nodes.Otherwise,IwillhavetodiscusstheissuewiththeresearchgroupandfindoutwheretheissueresidesinMASS.Aftertheevaluation,Iwillstartmynextdevelopmentphase,SystemIntegration,tointegrateMASSHDFSwiththeUWClimateAnalysiswebapplication.