Reminder from last time - University of Cambridge · Reminder from last time • History graphs;...

15
10/21/16 1 Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory Dr Robert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict isolation; enforcing isolation Two-phase locking; rollback Timestamp ordering (TSO) Optimistic concurrency control (OCC) Isolation and concurrency summary 2

Transcript of Reminder from last time - University of Cambridge · Reminder from last time • History graphs;...

10/21/16

1

ConcurrentsystemsLecture7:Crashrecovery,lock-free

programming,andtransactionalmemory

Dr RobertN.M.Watson

1

Reminderfromlasttime

• Historygraphs;good(andbad)schedules• Isolationvs.strictisolation;enforcingisolation• Two-phaselocking;rollback• Timestampordering(TSO)• Optimisticconcurrencycontrol(OCC)• Isolationandconcurrencysummary

2

10/21/16

2

Thistime

• Transactionaldurability:crashrecoveryandlogging–Write-aheadlogging– Checkpoints– Recovery

• Advancedtopics– Lock-freeprogramming– Transactionalmemory

• Afewnotesonsupervisionexercises

3

CrashRecovery&Logging

• TransactionsrequireACID properties– SofarhavefocusedonI (andimplicitlyC).

• HowcanweensureAtomicity&Durability?– Needtomakesurethatifatransactionalwaysdoneentirelyornotatall

– Needtomakesurethatatransactionreportedascommittedremainsso,evenafteracrash

• Considerfornowafail-stopmodel:– Ifsystemcrashes,allin-memorycontentsarelost– Dataondisk,however,remainsavailableafterreboot

4

Thesmallprint:wemustkeepinmindthelimitationsoffail-stop,evenasweassumeit.Failinghardware/softwaredoweirdstuff.Payattentiontohardwarepricedifferentiation.

10/21/16

3

Usingpersistentstorage

• Simplest“solution”:writeallupdatedobjectstodiskoncommit,readbackonreboot– Doesn’twork,sincecrashcouldoccurduringwrite– CanfailtoprovideAtomicityand/orConsistency

• Insteadsplitupdateintotwostages1. Writeproposedupdatestoawrite-aheadlog2. Writeactualupdates

• Crashduring#1=>noactualupdatesdone• Crashduring#2=>uselogtoredo,orundo

5

Write-aheadlogging• Log:anordered,append-onlyfileondisk• Containsentrieslike<txid,obj,op,old,new>– IDoftransaction,objectmodified,(optionally)theoperationperformed,theoldvalueand thenewvalue

– Thismeanswecanboth“rollforward”(redooperations)and“rollback”(undooperations)

• Whenpersistingatransactiontodisk:– Firstlogaspecialentry<txid,START>– Nextloganumberofentriestodescribeoperations– Finallyloganotherspecialentry<txid,COMMIT>

• Webuildcomposite-operationatomicityfromfundamentalatomicunit:single-sectorwrite.– Muchlikebuildinghigh-levelprimitivesoverLL/SC orCAS!

6

10/21/16

4

Usingawrite-aheadlog• Whenexecutingtransactions,performupdatestoobjectsinmemorywithlazywriteback– I.e.theOScandelaydiskwritestoimproveefficiency

• Invariant:writelogrecordsbeforecorrespondingdata• Butwhenwishtocommit atransaction,mustfirstsynchronously flushacommitrecordtothelog– Assumethereisafsync() orfsyncdata() operationorsimilarwhichallowsustoforcedataouttodisk

– Onlyreporttransactioncommittedwhenfsync() returns• Canimproveperformancebydelayingflushuntilwehaveanumberoftransactiontocommit- batching– Henceatanypointintimewehavesomeprefixofthewrite-aheadlogondisk,andtherestinmemory

7

TheBigPicture

8

RAM

ObjectValues

x = 3y = 27

Disk

ObjectValues

x = 1y = 17

z = 42

OlderLogEntries

NewerLogEntries

LogEntries

T2, z, 40, 42T2, STARTT1, START

T0, COMMITT0, x, 1, 2

T0, START

T3, STARTT2, ABORTT2, y, 17, 27

T1, x, 2, 3

LogEntries

RAMactsasacacheofdisk(e.g.noin-memorycopyofz)

On-diskvaluesmaybeolderversionsofobjects– ornewuncommittedvaluesaslong

ason-disklogdescribesrollback(e.g.,z)

Logconceptuallyinfinite,andspansRAM&Disk

10/21/16

5

Checkpoints

• Asdescribed,logwillgetverylong– Andneedtoprocesseveryentryinlogtorecover

• Bettertoperiodicallywriteacheckpoint– Flushallcurrentin-memorylogrecordstodisk– Writeaspecialcheckpointrecordtologwhichcontainsalistofactivetransactions

– Flushall‘dirty’objects(i.e.ensureobjectvaluesondiskareuptodate)

– Flushlocationofnewcheckpointrecordtodisk• (Notfatalifcrashduringfinalwrite)

9

Checkpointsandrecovery

• Keybenefitofacheckpointisitletsusfocusourattentiononpossiblyaffectedtransactions

10

TimeCheckpointTime FailureTime

T1

T2

T3

T4

T5

T1:noactionrequired

T2:REDO

T3:UNDO

T4:REDO

T5:UNDO

Activeatcheckpoint.Hassincecommitted;andrecordinlog.

Activeatcheckpoint;inprogressatcrash.

Notactiveatcheckpoint.Buthassincecommitted,andcommitrecordinlog.

Notactiveatcheckpoint,andstillinprogress.

10/21/16

6

Recoveryalgorithm• InitializeundolistU ={setofactivetxactions}• AlsohaveredolistR,initiallyempty• Walklogforwardfromcheckpointrecord:– IfseeaSTARTrecord,addtransactiontoU– IfseeaCOMMITrecord,movetransactionfromU->R

• Whenhitendoflog,performundo:– WalkbackwardandundoallrecordsforallTx inU

• Whenreachcheckpointrecordagain,Redo:– Walkforward,andre-doallrecordsforallTx inR

• Afterrecovery,wehaveeffectivelycheckpointed– On-diskstoreisconsistent,socantruncate thelog

11

Theorderinwhichweapplyundo/redorecordsisimportanttoproperlyhandlingcaseswheremultipletransactionstouchthesamedata

Write-aheadlogging:assumptions• Whatcangowrongwritingcommitstodisk?• Evenifsectorwritesareatomic:

– Allaffectedobjectsmaynotfitinasinglesector– Largeobjectsmayspanmultiplesectors– Trendtowardscopy-on-write,ratherthanjournaled,FSes– Manyoftheproblemsseenwithin-memorycommit(ordering

andatomicity)applytodisksaswell!• Contemporarydisksmaynotbeentirelyhonestabout

sectorsizeandatomicity– E.g.,unstablewritecachestoimproveefficiency– E.g.,largerorsmallersectorsizesthanadvertises– E.g.,non-atomicitywhenwritingtomirroreddisks

• Theseassumesfail-stop– whichisnottrueforsomemedia

12

10/21/16

7

Transactions:summary

• Standardmutualexclusiontechniquesnotgreatfordealingwith>1object– intricatelocking(&lockorder)required,or– singlecoarse-grainedlock,limitingconcurrency

• Transactionsallowusabetterway:– potentiallymanyoperations(readsandupdates)onmanyobjects,butshouldexecuteasifatomically

– underlyingsystemdealswithprovidingisolation,allowingsafeconcurrency,andevenfaulttolerance!

• Transactionsusedindatabases+filesystems

13

AdvancedTopics

• Willbrieflylookattwoadvancedtopics– lock-freedatastructures,and– transactionalmemory

• Then,nexttime,ontoacasestudy

14

10/21/16

8

Lock-freeprogramming• What’swrongwithlocks?– Difficulttogetright(iflocksarefine-grained)– Don’tscalewell(iflockstoocoarse-grained)– Don’tcomposewell(deadlock!)– Poorcachebehavior(e.g.convoying)– Priorityinversion– Andcanbeexpensive

• Lock-freeprogramminginvolvesgettingridoflocks...butnotatthecostofsafety!

• RecallTAS,CAS,LL/SC fromourfirstlecture:whatifweusedthemtoimplementsomethingotherthanlocks?

15

Assumptions• Wehaveasharedmemorysystem• Low-level(assemblyinstructions)include:

16

val = read(addr); // atomic read from memory(void) write(addr, val); // atomic write to memorydone = CAS(addr, old, new); // atomic compare-and-swap

• Compare-and-Swap(CAS) isatomic• readsvalueofaddr (‘val’),compareswith‘old’,andupdatesmemoryto‘new’iff old==val -- withoutinterruption!

• somethinglikethisinstructioncommononmostmodernprocessors(e.g.cmpxchg onx86– or LL/SC onRISC)

• Typicallyusedtobuildspinlocks(ormutexes,orsemaphores,orwhatever...)

10/21/16

9

Lock-freeapproach• DirectlyuseCAS toupdateshareddata• Asanexampleconsideralock-freelinkedlistofintegervalues– listissinglylinked,andsorted– UseCAS toupdatepointers– HandleCAS failurecases(i.e.,races)

• Representsthe‘set’abstractdatatype,i.e.– find(int)->bool– insert(int)->bool– delete(int)->bool

• Assumption:hardwaresupportsatomicoperationsonpointer-sizetypes

17

Searchingasortedlist

• find(20):

Non-blockingdatastructuresandtransactionalmemory

H 10 30 T

20?

find(20)->false

18

10/21/16

10

InsertinganitemwithCAS

• insert(20):

Non-blockingdatastructuresandtransactionalmemory

H 10 30 T

20

30® 20ü

insert(20)->true

19

InsertinganitemwithCAS

• insert(20):

Non-blockingdatastructuresandtransactionalmemory

H 10 30 T

20

30® 20

25

30® 25üû

• insert(25):

20

10/21/16

11

Concurrentfind+insert

• find(20)

H 10 30 T

-> false

20

20?

• insert(20) ->true

Non-blockingdatastructuresandtransactionalmemory 21

Concurrentfind+insert

• find(20)

H 10 30 T

-> false

20

20?

• insert(20) ->true

Non-blockingdatastructuresandtransactionalmemory

Thisthreadsaw20wasnotintheset...

...butthisthreadsucceededinputting

itin!

• Isthisacorrectimplementationofaset?

• Shouldtheprogrammerbesurprisedifthishappens?

• Whataboutmorecomplicatedmixesofoperations?22

10/21/16

12

Linearisability• Aswithtransactions,wereturntoaconceptualmodeltodefinecorrectness– alock-freedatastructureis‘correct’ifallchanges(andreturnvalues)areconsistentwithsomeserialview:wecallthisalinearisable schedule

• Henceinthepreviousexample,wewereok:– canjustdeemthefind()tohaveoccurredfirst

• Getsalotmorecomplicatedformorecomplicateddatastructures&operations!

• NB:Oncurrenthardware,synchronisation doesmorethanjustprovideatomicity– Alsoprovidesordering:“happens-before”– Lock-freestructuresmusttakethisintoaccountaswell

23

TransactionalMemory(TM)

• Stealideafromdatabases!• Insteadof: lock(&mylock);

shared[i] *= shared[j] + 17;unlock(&mylock);

4Use: atomic { shared[i] *= shared[j] + 17;

}

4Has“obvious”semantics,i.e.alloperationswithinblockoccurasifatomically

4Transactional sinceunder thehooditlookslike:do { txid = tx_begin(&thd);

shared[i] *= shared[j] + 17;} while !(tx_commit(txid));

10/21/16

13

TMadvantages• Simplicity:– Programmerjustputsatomic{}aroundanythinghe/shewantstooccurinisolation

• Composability:– Unlikelocks,atomic{}blocksnest,e.g.:

credit(a, x) = atomic { setbal(a, readbal(a) + x);

}debit(a, x) = atomic {

setbal(a, readbal(a) - x);}transfer(a, b, x) = atomic {

debit(a, x);credit(b, x);

}

TMadvantages• Cannotdeadlock:– Nolocks,sodon’thavetoworryaboutlockingorder– (Thoughmaygetlivelockifnotcareful)

• Noraces(mostly):– Cannotforgettotakealock(althoughyoucanforgettoputatomic{}aroundyourcriticalsection;-))

• Scalability:– HighperformancepossibleviaOCC– Noneedtoworryaboutcomplexfine-grainedlocking

• Thereisstillasimplicityvs.performancetradeoff– Toomuchatomic{}andimplementationcan’tfindconcurrency.Toolittle,andraceconditions.

10/21/16

14

TMisverypromising…• Essentiallydoes‘ACI’butnoD– noneedtoworryaboutcrashrecovery– canworkentirelyinmemory– somehardwaresupportemerging(orpromised)

• Butnotapanacea– Contentionmanagementcangetugly– Difficultieswithirrevocableactions(e.g.IO)– Stillworkingoutexactsemantics(typeofatomicity,handlingexceptions,signaling,...)

• Recentx86hardwarehasstartedtoprovidedirectsupportfortransactions;notwidelyused– …Andpromptlywithdrawninerrata– Nowbackonthestreetagain– butverynew

Supervisionquestions+exercises

• Supervisionquestions– S1:Threadsandsynchronisation

• Semaphores,priorities,andworkdistribution– S2:Transactions

• ACIDproperties,2PL,TSO,andOCC– OtherC&DStopicsalsoimportant,ofcourse!

• OptionalJavapracticalexercises– Javaconcurrencyprimitivesandfundamentals– Threads,synchronisation,guardedblocks,producer-consumer,anddataraces

28

10/21/16

15

Concurrentsystems:summary• Concurrencyisessentialinmodernsystems– overlappingI/Owithcomputation– exploitingmulti-core– buildingdistributedsystems

• Butthrowsupalotofchallenges– needtoensuresafety,allowsynchronization,andavoidissuesofliveness (deadlock,livelock,...)

• Majorriskofover-engineering– generallyworthbuildingsequentialsystemfirst– andworthusingexistinglibraries,toolsanddesignpatternsratherthanrollingyourown!

29

Summary+nexttime• Transactionaldurability:crashrecoveryandlogging

– Write-aheadlogging;checkpoints;recovery• Advancedtopics

– Lock-freeprogramming– Transactionalmemory

• Notesonsupervisionexercises

• Nexttime:– ConcurrentsystemcasestudytheFreeBSDkernel– Briefhistoryofkernelconcurrency– Primitivesanddebuggingtools– Applicationstothenetworkstack

30