ZooKeeper Con*nued and MapReduce - Brown...

23
ZooKeeper Con*nued and MapReduce Zookeeper and Chubby

Transcript of ZooKeeper Con*nued and MapReduce - Brown...

Page 1: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

ZooKeeperCon*nuedandMapReduce

ZookeeperandChubby

Page 2: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Today•  ZookeeperWrapUp

•  MapReduce(bigdataanaly*csatGoogle)

•  Grades

•  ClosingRemarks

•  Cri*calreviews

Page 3: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

APIZooKeeper Chubby

Close/Open()

delete(path,expectedVersion)‏ Delete()

create(path,data,acl,flags)‏

setData(path,data,expectedVersion) setContent()

getData(path,watch)‏ getContentAndStat()

getChildren(path,watch)

usegetContent()ondirectory

exists(path,watch)‏

sync(path)‏

LockRelatedCalls Acquire()/TryAcquire()

Release

SequenceNumbercalls ImplicitlyManaged:•  Flagpassedtocreate()requestsversion•  ZKincrementsIDaTercrea*ngfiles•  IDisusedasexpectedVersion

ExplicitManaged:•  GetSequencer()•  SetSequencer()•  CheckSequencer()

Page 4: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

APIZooKeeper Chubby

Close/Open()

delete(path,expectedVersion)‏ Delete()

create(path,data,acl,flags)‏

setData(path,data,expectedVersion) setContent()

getData(path,watch)‏ getContentAndStat()

getChildren(path,watch)

usegetContent()ondirectory

exists(path,watch)‏

sync(path)‏

LockRelatedCalls Acquire()/TryAcquire()

Release

SequenceNumbercalls ImplicitlyManaged:•  Flagpassedtocreate()requestsversion•  ZKincrementsIDaTercrea*ngfiles•  IDisusedasexpectedVersion

ExplicitManaged:•  GetSequencer()•  SetSequencer()•  CheckSequencer()

Noneedforopen/closebecauseAllcallshavepathinthem

Nolocks

Page 5: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Read/WriteInterac*on

L

C1 C1C1C1

C1 C1

Zookeeper

L

C1 C1C1C1

C1 C1

Chubby

Read:blueWrite:red

•  Writes:linearizable(gothroughleaders)

•  Reads:Inzookeepercanbeservedlocallybyanynode.InChubbymustgothroughleader–  Weakerreadseman*cs

•  Requests:Clientscansendmul*plerequestsata*me–  RequestsservedinaFIFOorder

•  TCPprovidesFIFO

Page 6: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

/theoapp/

.../config

…/IP1

../members/

…/IP3 …/IP2

C1C1 C1

Ephemeralfiles

C1C1 C1

Watchfiles

../leader/

…/cand3…/cand1 …/cand2

C1C1 C1

LeaderElec4on(ge8ngLocks)•  Trytocreatefile:firsttocreateisleader“/theoapp/leader”•  Fileisephemeral.Ifleaderdies.Filedies.•  Monitorfileandwhenitdisappears,you

trytocreateandbecomeleader•  Everyonedetectsandtriestograbcreate

fileatthesame*me.•  Createslotsofinefficiency

GroupMembership•  Ephemeralfiles•  Nodeswatchforupdates•  Nodesreadthechildren

LeaderchangingNodeconfigura4on•  Firstdelete“/theoapp/ready”•  Clientsgetno*fica*onofdele*on.If

clientistryingtoread,oncetheno*fica*onarrivestheclientstops

•  Update“/theoapp/config”•  Create“/theoapp/ready”•  Clientsgetno*fica*onofcrea*on.Now,

theyknowconfigshavechangedandtheycanreadtheconfigsfile

.../ready

Page 7: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

ZookeeperV.Chubby•  LessonsfromChubby

–  Mostrequestsareread/keep-alive–  Fewdevelopersuselocks–  FilesystemAPIiseasytouse

•  Zookeeper=Chubbywithweakerreadseman*csandnolocks–  Weakerreadseman*cs:clientcanreadfromanynode–  Writesmusts*llgothroughleader–  Whilenolock,youcanimplementlocks–  EnableAsynchrequests:FIFOexecu*on

Page 8: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

MapReduce:BigDataAnaly*csatGoogle

Page 9: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

GoogleEnvironment•  Lots(tensofthousands)ofcomputers

–  allmore-or-lessequal•  processor,disk,memory,networkinterface

–  nospecializedservers–  evenifonly.01%downatanyonemoment,manywillbedown

•  Simplejobsbecomecomplicated–  Lotsofservers—>scaletomanynodes

•  Par**ondata•  Par**onprocessing/compu*ng

–  Commodityserver—>faulttolerance

Page 10: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

MapReduce•  MapReduce:languageAPIlibrarytohidecomplexity–  Performance– Availability–  Scalability

•  AllqueriesmodelsasMapandReduce– MAP:takeasetofdataentriesandapplyanoperatoronthem

–  Reduce:takeintermediatedatatocombinethem

Page 11: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

MapReduce•  map

–  foreachpairinasetofkey/valuepairs,produceasetofnewkey/valuepairs

•  reduce–  foreachkey

•  lookatallthevaluesassociatedwiththatkeyandcomputeasmallersetofvalues

Page 12: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Implementa*onSketch(1)

split0split1

splitM-1

Input(onGFS)

worker

worker

worker

master

worker

worker

mapphase:

Mworkers

intermediatefiles(onlocal

disks)par**onedintoRpieces

reducephase:

Rworkers

outputfiles(onGFS)

Page 13: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

FindallthewordsForeachoccurrence

create<word,1>

Example Goal count the number of times a word appears in

all documents map(String key, String value) { // key: document name // value: document contents for each word w in value EmitIntermediate(w, 1); } reduce(String key, Iterator values) { // key: a word // values: a list of counts for each v in values result += v; Emit(result); }

Sumupallkey,pairwithsameword(key)

Page 14: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Implementa*onSketch(2)

•  Map’sinputpairsdividedintoMsplits–  storedinGFS–  Splitallowsforparallelism

•  OutputofMap/InputofReducedividedintoRpieces•  Onemasterprocessisincharge:farmsoutworktoW(<<

M+R)workermachines

split0split1

splitM-1

Input(onGFS)

worker

worker

worker

mapphase:

Mworkers

Threetypesofprocesses*Master•  Reducer(worker)•  Mapper(worker)

Page 15: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Implementa*onSketch(3)

•  Masterpar**onssplitsamongsomeoftheworkers–  eachworkerpassespairstouser-suppliedmapfunc*on–  resultsstoredinlocalfiles

•  par**onedintopieces–  e.g.,hash(key)modR

–  remainingworkersperformreducetasks•  theRpiecesarepar**onedamongthem•  placeremoteprocedurecallstomapworkerstogetdata•  putoutputinGFS

split0split1

splitM-1Input(onGFS)

worker

worker

worker

master

worker

worker

mapphase:

Mworkers

intermediatefiles(onlocal

disks)par**onedinto

Rpieces

reducephase:

Rworkers

outputfiles(onGFS)

Storedlocallyandwillbelostonworkerfailure

Page 16: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

ComminngData

•  Maptask–  outputkeptinRlocalfiles–  loca*onssendtomasteronlyontaskcomple*on

•  Reducetask–  outputstoredonGFSusingtemporaryname–  fileatomicallyrenamedontaskcomple*on(tofinalname)

split0split1

splitM-1Input(onGFS)

worker

worker

worker

master

worker

worker

mapphase:

Mworkers

intermediatefiles(onlocal

disks)par**onedinto

Rpieces

reducephase:

Rworkers

outputfiles(onGFS)

Page 17: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

CopingwithFailure(1)

•  Mastermaintainsstateofeachtask–  idle(notstarted)–  inprogress–  Completed

•  Masterpingsworkersperiodicallytodetermineifthey’reup

Page 18: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

CopingwithFailure(2)

•  Workercrashes–  in-progresstaskshavestatesetbacktoidle

•  alloutputislost•  restartedfrombeginningonanotherworker

–  completedmaptasks•  alloutputlost•  restartedfrombeginningonanotherworker•  reducetasksusingoutputareno*fiedofnewworker

Page 19: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

CopingwithFailure(3)

•  Workercrashes(con*nued)–  completedreducetasks

•  outputalreadyonGFS•  norestartnecessary

•  Mastercrashes–  couldberecoveredfromcheckpoint–  inprac*ce

•  mastercrashesarerare•  en*reapplica*onisrestarted

Page 20: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

Retrospec*ve•  MapReduceàYahooHadoop(nowApache)

–  YearsofresearchcreatedSpark

•  See:hqp://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html1)  agiantstepbackwardintheprogrammingparadigmforlarge-scale

dataintensiveapplica*ons2)  asub-op*malimplementa*on,inthatitusesbruteforceinsteadof

indexing3)  notnovelatall—itrepresentsaspecificimplementa*onofwell

knowntechniquesdevelopednearly25yearsago4) missingmostofthefeaturesthatarerou*nelyincludedincurrent

DBMS5)  incompa*blewithallofthetoolsDBMSusershavecometodepend

on

Page 21: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

CurrentGrading

•  HW2andHW3done.Willbereleasedtoday– HW2:median:90,std-dev:10– HW3:median:91,std-dev:9

•  Projects– Tapestrybeingfinished– WillstartraTgradingthisweekend– Gradingtakeawhileduetopar*alcredits

Page 22: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

FinalGrades

•  CourseRubics– Projects:50%– HWS:20%– Midterm:10%– Final:20%

•  Individualprojects/midtermhavebeencurved

•  Finalgradewillalsobecurved

Page 23: ZooKeeper Con*nued and MapReduce - Brown Universitycs.brown.edu/courses/cs138/s18/lectures/L22.pdf · 2018-04-30 · Zookeeper V. Chubby • Lessons from Chubby – Most requests

ClosingRemarks•  Distributedsystems:artofprovidingconsensuswhiletackling

failureswhileprovidinghighperformance

•  FailureV.PerformanceV.Correctness/consensus–  Differenttypeoffailuresàdifferentimplica*ons(differentdetectors)

•  Mostlyheartbeats

–  Dependsontheapplica*on:some*mesyoudon’tneedlinearizable(totalordering)onallevents•  Zookeeper:readsarenotlinearizable•  Dynamo/Cassandra:reads/writesarecausallyconsistent

–  Performance:Shardanddistribute•  Consistenthashingtofindshards

•  FinalonMonday5/14at2pm