Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.
Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel...
Transcript of Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel...
![Page 1: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/1.jpg)
csinparallel.org
Using Map-Reduce to Teach Parallel Programming
Concepts
Dick Brown, St. Olaf CollegeLibby Shoop, Macalester College
Joel Adams, Calvin College
![Page 2: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/2.jpg)
csinparallel.org
Workshopsite
CSinParallel.org->Workshops->WMRWorkshopSeealsoworkshophandout
![Page 3: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/3.jpg)
csinparallel.org
Introductorycomments
– Roleofundergraduateresearchers:Therewouldbenoworkshopwithoutthem!
– ThankstoAmazonWebServicesforprovidingcreditstohostourWMRinstance
– Disclaimer:Wearenotproposingmap-reduceastheonlyapproachtointroducingparallelism,concurrency
– Avalue:Honorthyneighbor'scurricularapproach
![Page 4: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/4.jpg)
csinparallel.org
Goals
– Introducemap-reducecompuLng,usingtheWebMapReduce(WMR)simplifiedinterfacetoHadoop
• Whyusemap-reduceinthecurriculum?
– Hands-onexerciseswithWMRforfoundaLoncourses
![Page 5: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/5.jpg)
csinparallel.org
Goals
Part1–IntroducLon– Map-reducecompuLng,andtheWebMapReduce(WMR)simplifiedinterfacetoHadoop
– Hands-onexerciseswithWMRforfoundaLoncourses
Part2–TeachingwithWMR– Whyusemap-reduceinthecurriculum?– UseofWMRforintermediateandadvancedcourses– Hands-onexercisesformoreadvanceduse
Part3(opLonal)–What’sunderthehood?
![Page 6: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/6.jpg)
csinparallel.org
SneakPreview:Materialsavailable
(Incaseyoualreadyknowyourmap-reduce…)
• CSinParallelmodule:Map-reduceCompu;ngforIntroductoryStudentsusingWebMapReduce– Seecsinparallel.org
![Page 7: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/7.jpg)
csinparallel.org
Part1:IntroducLontoMap-ReduceCompuLngandWMR
![Page 8: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/8.jpg)
csinparallel.org
IntroducLontoMap-ReduceCompuLng
![Page 9: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/9.jpg)
csinparallel.org
History
– ThecomputaLonalmodelofusingmapandreduceoperaLonswasdevelopeddecadesago,forLISP
– GoogledevelopedMapReducesystemforsearchengine,published(DeanandGhemawat,2004)
– Yahoo!createdHadoop,anopen-sourceimplementaLon(underApache);Javamappersandreducers
![Page 10: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/10.jpg)
csinparallel.org
Map-Reduce:The2-minuteoverview
Whatifyouwantedtocountthefrequenciesofallwords
in1,000,000books?
![Page 11: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/11.jpg)
csinparallel.org
Map-Reduce:The2-minuteoverview
Whatifyouwantedtocountthefrequenciesofallwords
in1,000,000books?
1. Breakupthelinesoftext:generateonelabelledpieceperword
• Usethatwordaslabel;value1foreachpiece
2. Groupthepiecesaccordingtolabel(word)3. Addupthe1’sineachgroup
![Page 12: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/12.jpg)
csinparallel.org
Map-ReduceConcept
![Page 13: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/13.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel• Map-reduceisatwo-stageprocesswitha"shuffletwist"
betweenthestages.
• StagesarecontrolledbyfuncLons:mapper(),reducer()
![Page 14: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/14.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel
• mapper()funcLon:– Argumentisonelineofinputfromafile– Produces(key,value)pairs
• Example:word-countmapper()"thecatinthehat”-->[mapperforthisline]("the","1"),("cat","1"),("in","1"),("the","1"),("hat","1")
![Page 15: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/15.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel
• Shufflestage:– groupallmappers’(key,value)pairstogetherthathavethesamekey,andfeedeachgrouptoitsowncallofreduce()
– Input:all(key,value)pairsfromallmappers– Output:Thosepairsrearranged,senttocallsofreduce()accordingtokey
• Note:Shufflealsosorts(opLmizaLon)
![Page 16: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/16.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel
• reducer()funcLon:– Receivesallkey-valuepairsforonekey– Producesanaggregateresult
• Example:word-countreducer()("the","1"),("the","1")-->[reducerfor"the"]("the","2")
![Page 17: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/17.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel
– Inmap-reduce,aprogrammercodesonlytwofuncLons(plusconfiginformaLon)
• Amodelforfutureparallel-programmingframeworks
– Underlyingmap-reducesystemreusescodefor• ParLLoningthedataintochunksandlines,• Runsmappers/reducerswherethechunksarelocal• Movingdatabetweenmappersandreducers• Auto-recoveringfromanycrashesthatmayoccur• ...
– OpLmized,Distributed,Fault-tolerant,Scalable
![Page 18: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/18.jpg)
csinparallel.org
Themap-reducecomputaLonalmodel
• OpLmized,Distributed,Fault-tolerant,Scalable
1.mappers 2.shuffle 3.reducersLocalI/O
GlobalI/O
![Page 19: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/19.jpg)
csinparallel.org
DemoofWMR
cumulus.cs.stolaf.edu/wmr
Intromodule
![Page 20: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/20.jpg)
csinparallel.org
Materialsavailable
• CSinParallelmodule:Map-reduceCompu;ngforIntroductoryStudentsusingWebMapReduce– Seecsinparallel.org
![Page 21: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/21.jpg)
csinparallel.org
Overviewofsuggestedexercises
Availableonthecsinparallel.orgsite
– Runwordcount(provided),withsmallandlargedata
– Modify,runvariaLonsonwordcount:strippunctuaLon;caseinsensiLve;etc.
– AlternaLveexercises
![Page 22: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/22.jpg)
csinparallel.org
AddiLonalexercises
Beyondyourfirstsimpleexercises,considerexploringthefollowing:• Variousdatasets
– Note:PleaseavoidlargeGutenberg"groups"forthisworkshop
• ExtendedsetofexercisesforCS1(textanalysis)
![Page 23: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/23.jpg)
csinparallel.org
Hands-onexploraLonofWMR
![Page 24: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/24.jpg)
csinparallel.org
Part2:TeachingwithWMR
Whymap-reduce?WhyWMR?
TeachingWMRinCS1;inothercourses
![Page 25: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/25.jpg)
csinparallel.org
Whyteachmap-reduce?
![Page 26: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/26.jpg)
csinparallel.org
Whymap-reduceforteachingnoLonsofparallelism/concurrency?
– Concepts:• dataparallelism;• taskparallelism;• locality;• effectsofscale;• exampleeffecLveparallelprogrammingmodel;• distributeddatawithredundancyforfaulttolerance;...
![Page 27: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/27.jpg)
csinparallel.org
Whymap-reduceforteachingnoLonsofparallelism/concurrency?
– Real-World:Hadoopwidelyused– ExciLng:theappealofGoogle,Facebook,etc.– Useful:forappropriateapplicaLons– Powerful:scalabilitytolargeclusters,largedata
![Page 28: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/28.jpg)
csinparallel.org
WhyWMR?
– Introduceconceptsofparallelism– Lowbarforentry,feasibleforCS1(andbeyond)– CapturetheimaginaLonsofstudents
• SupportsrapidintroducLonofconceptsofparallelismforeveryCSstudent– Intromoduledesignedfor1-3daysofclass
![Page 29: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/29.jpg)
csinparallel.org
WebMapReduce(WMR)
– SimplifiedwebinterfaceforHadoopcomputaLons
– Goals:• StrategicallySimplesuitableforCS1,butnotatoy
• Configurablewritemappers/reducersinanylanguage
• AccessiblewebapplicaLon• MulL-plamorm,front-endandback-end
![Page 30: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/30.jpg)
csinparallel.org
WMRFeatures(Briefly)
– TesLnginterface• Errorfeedback• BypassesHadoop--smalldataonly!
– StudentsenterthefollowinginformaLon:• choiceoflanguage• datatoprocess• definiLonofmapperinthatlanguage• definiLonofreducerinthatlanguage
![Page 31: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/31.jpg)
csinparallel.org
WMRsysteminformaLon
– Languagescurrentlysupported:Java,C++,Python,Scheme,C,C#,Javascript
• Rcomingsoon
– Backendstodate:localcluster,AmazonEC2cloudimages
• Versionlimitsandmorebackendscomingsoon
Moredetailsaboutthesystemin(opLonal)Part3oftheworkshop
![Page 32: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/32.jpg)
csinparallel.org
Teachingmap-reducewithWMRintheintroductorysequence
![Page 33: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/33.jpg)
csinparallel.org
KinestheLcstudentacLvity• VisualizaLonsofmap-reducecomputaLonsareenoughforsomestudents,butnotall
• Anin-classac/vitytoactoutthemap/shuffle/reduceprocesshelpsothers
• Alsohelpful:imagesofclusters;sequenLalversions;contextofwell-knownwebservices
![Page 34: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/34.jpg)
csinparallel.org
WMRinadvancedcourses
Example:PDCelecLve• CS1module• Map-reduceprogrammingtechniques
– FeaturesofWMR– Contextforwarding– Structuredvalues;structuredkeys– Mul;-casemappers;mul;-casereducers– Broadcas;ngdatavalues
![Page 35: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/35.jpg)
csinparallel.org
ExamplesforTextProcessingTechniques
• Combiningdatawithinamapper– Mapper:Tallycountsofwordsbeforesendingtoreducer
• ComputaLonallinguisLcs:– wordsthatareco-located
• FindandcountpairsExampleIn:thecatinthecathatEmits:1cat|in1in|the1cat|hat2the|cat
• Usecombiningproceduretofind‘stripes’ExampleIn:thecatandthedogfoughtoverthedogboneEmits:(the,{cat:1,dog:2}
Thanksto:DataIntensiveTextProcessing,byJimmyLinandChrisDyer
![Page 36: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/36.jpg)
csinparallel.org
ApplicaLonideas
• Examplesintheintroductorymodule• Bigdatasetspeoplecareabout
• Especiallyforunstructureddata• Convenientforcertainkindsofprojects
– E.g.,mostcommonmedicalterminology
![Page 37: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/37.jpg)
csinparallel.org
WMRHands-on,conLnuedModuleexercisesExtendedexercisesetDatasetsavailable/shared/MovieLens2/movieRaLngs/shared/gutenberg/WarAndPeace.txt/shared/gutenberg/CompleteShakespeare.txt
![Page 38: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/38.jpg)
csinparallel.org
Part3:What’sunderthehood
![Page 39: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/39.jpg)
csinparallel.org
AboutWMR
• WMRanditsarchitecture
• ObtainingandinstallingWMR– WebMapReduce.sf.com
ClusterHeadNode
WebServer
UserBrowser
UserBrowser
Cluster
![Page 40: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/40.jpg)
csinparallel.orgBasic
Hadoopcomponents• Internals:
– Jobmanagement(percluster)– Taskmanagement(percomputaLonnode)
• Somecomponentsvisibletotheuser:– HadoopAPI–Java,orarbitraryexecutables(“Streaming”)
– HadoopDistributedFileSystem(HDFS)– Supporttools,includinghadoop command– Limitedjobmonitoring…
![Page 41: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/41.jpg)
csinparallel.org
Goals
– Introducemap-reducecompuLng,usingtheWebMapReduce(WMR)simplifiedinterfacetoHadoop
• Whyusemap-reduceinthecurriculum?– Hands-onexerciseswithWMRforfoundaLoncourses
– UseofWMRforintermediateandadvancedcourses
• What’sunderthehoodwithWMR• ApeekatHadoop…
– Hands-onexercisesformoreadvanceduse
![Page 42: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/42.jpg)
csinparallel.org
WMRinadvancedcourses
![Page 43: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/43.jpg)
csinparallel.org
InverLng"Chapter1:CallmeIshmael.Some…”"Chapter2:Istuffedashirtortwo...""Chapter3:Enteringthatgable-ended...”
-->[mapper]("call","1"),("me","1"),...,("i","2"),("stuffed”,"2"),...,("entering","3"),...
-->[reducer]
"a""1,1,1,1,...,2,2,2,...""aback""3,7,7,8,...”...
![Page 44: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/44.jpg)
csinparallel.org
Whenismap-reduceappropriate?
• Massive,unstructuredorirregularlystructured“bigdata”(Terascaleandupward)– Rawtext– Webpages– XML– Unstructuredstreamsofdata
• Otherapproachesmayfitstructured“bigdata”– Scalabledatabases– Large-scalestaLsLcalapproaches
![Page 45: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/45.jpg)
csinparallel.org
UsingHadoopdirectly(Java)
WordCount.javaexample
![Page 46: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/46.jpg)
csinparallel.org
DirectHadoopExamples
• Wordcount– Java
![Page 47: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/47.jpg)
csinparallel.org
QuickquesLons/commentssofar?
![Page 48: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/48.jpg)
csinparallel.org
Hands-on
![Page 49: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/49.jpg)
csinparallel.org
Overviewofsuggestedexercises
– ComputaLonswithMovieLens2data;mulLplemap-reducecycles
– Trafficdataanalysis– NetworkanalysisusingFlixterdata– TheMillionSongdataset
![Page 50: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/50.jpg)
csinparallel.org
Discussion
![Page 51: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/51.jpg)
csinparallel.org
EvaluaLons!
Linksat:CSinParallel.org->Workshops->WMRWorkshop(endofthepage)
![Page 52: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/52.jpg)
csinparallel.org
SomeconsideraLonswithHadoop
– Numbersofmappersandreducers– DFS– Faulttolerance– I/Oformats
– Note:wehavefurtherslideswithaddiLonalinformaLonabouttheseaspects,foryoutolookatonyourown.
![Page 53: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/53.jpg)
csinparallel.org
DirectHadoopexercisesetup
– Edityourownfiles,locally– scptocluster'sadminnode(oncloud)– sshtocompile,launchjob– Percentageprogressoutputisprovided– MulLplecyclesupportviaDFS– (Cleanup)
![Page 54: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/54.jpg)
csinparallel.org
AddiLonalDetailsaboutHadoop
![Page 55: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/55.jpg)
csinparallel.org
ThehadoopprojectdocumentaLon
• h}p://hadoop.apache.org/common/docs/current/index.html
![Page 56: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/56.jpg)
csinparallel.org
Howmanymappers?
• TheHadoopMap/ReduceframeworkspawnsonemappertaskforeachInputSplitgeneratedbytheInputFormatforthejob.
• Thenumberofmappersisusuallydrivenbythetotalsizeoftheinputs,thatis,thetotalnumberofblocksoftheinputfiles.
![Page 57: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/57.jpg)
csinparallel.org
Howmanyreducers?
• ThenumberofreducersforthejobissetbytheuserviaJobConf.setNumReduceTasks(int)
• Thesizeofyoureventualoutputmaydictatehowmanyreducersyouchoose.
![Page 58: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/58.jpg)
csinparallel.org
HDFS
• Fault-tolerantdistributedfilesystemmodeleda~ertheGoogleFileSystem– we'vehadstudentsreadtheoriginalGFSpaperinanadvancedcourse
• h}p://hadoop.apache.org/hdfs/docs/current/index.html
• NotethesecLonaboutthefilesystemcommandsyoucanrunfromthecommandline:hadoopfs-lsHadoopfs-getor-put
![Page 59: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/59.jpg)
csinparallel.org
HDFSAssumpLonsandGoals
• Hardwarefailure– HardwarefailureisthenormratherthantheexcepLon.
• Streamingdataaccess– ApplicaLonsthatrunonHDFSneedstreamingaccesstotheirdatasets.TheyarenotgeneralpurposeapplicaLonsthattypicallyrunongeneralpurposefilesystems.
• LargeDataSets• Simplecoherencymodel
– Readmany,writeonce• MovingcomputaLonsissimplerthanmovingdata• Portabilityacrossvarioushardware/so~ware
![Page 60: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/60.jpg)
csinparallel.org
![Page 61: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/61.jpg)
csinparallel.org
![Page 62: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/62.jpg)
csinparallel.org
Input/Outputformats
– InputintomappersareinterpretedusingclassesimplemenLngtheinterfaceInputFormat,andoutputfromreducersareimplementedusingclassesimplemenLngtheinterfaceOutputFormat.
– InWMR,themapperinputandreduceroutputisperformedwithkey-valuepairs.ThiscorrespondstousingtheclassesKeyValueTextInputFormatandTextOutputFormat.
– IndirectHadoop,thedefaultinputformatisTextInputFormat,inwhichvaluesarelinesofthefileandkeysareposiLonswithinthatfile.
![Page 63: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/63.jpg)
csinparallel.org
SomefurtherfeaturesofHadoop
– Combiner,anopLmizaLon:performsome"reducLon"duringthemapphase,a~ermapper()andbeforeshuffle
– SorLngcontrol• Note:hardtosortonsecondarykey
– Threeprogramminginterfaces:Java;pipes(C++);streaming(executables)
![Page 64: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel](https://reader035.fdocuments.us/reader035/viewer/2022062920/5f027be97e708231d4047d90/html5/thumbnails/64.jpg)
csinparallel.org
Pagerankalgorithmideas
• Originaldata:onewebpageperline
– mapperproduces("dest","1/kPn")foreachlinkinpagePnwhereklinksappearwithinthatpagePnreducerproduces("dest","weight_0P1P2P2P3P4...")whereweightissumoftheweightsfromkeyvaluepairsemi}edbyP1,P2,...
– SubsequentmappersandreducersproducerefinedweightsthattakeintoaccountdeeperchainsofpagespoinLngtopages
– Finalreducerdelivers("dest","weight_k")[dropPns]