Download - WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Transcript
Page 1: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

WORKINGSETANALYTICS6/15/20--DRAFTv2PeterJ.DenningNavalPostgraduateSchool,Monterey,CA

Abstract

Theworkingsetmodelforprogrambehaviorwasinventedin1965.Ithasstoodthetestoftimeinvirtualmemorymanagementforoverfiftyyears.Itisconsideredtheidealformanagingmemoryinoperatingsystemsandcaches.Itssuperiorperformancewasbasedontheprincipleoflocality,whichwasdiscoveredatthesametime;localityistheobservedtendencyofprogramstousedistinctsubsetsoftheirpagesoverextendedperiodsoftime.Thistutorialtracesthedevelopmentofworkingsettheoryfromitsoriginstothepresentday.Wewilldiscusstheprincipleoflocalityanditsexperimentalverification.Wewillshowwhyworkingsetmemorymanagementresiststhrashingandgeneratesnear-optimalsystemthroughput.Wewillpresentthepowerful,linear-timealgorithmsforcomputingworkingsetstatisticsandapplyingthemtothedesignofmemorysystems.Wewilldebunkseveralmythsaboutlocalityandtheperformanceofmemorysystems.Wewillconcludewithadiscussionoftheapplicationoftheworkingsetmodelinparallelsystems,modernsharedCPUcaches,networkedgecaches,andinventoryandlogisticsmanagement.

Keywords:Workingset,workingsetmodel,programbehavior,virtualmemory,cache,pagingpolicy,locality,localityprinciple,thrashing,multiprogramming,memorymanagement,optimalpaging

Page 2: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 2

VirtualmemorymadeitspublicdebutattheUniversityofManchesterin1962.Itwashailedasabreakthroughinautomaticmemorymanagementandenjoyedanenthusiasticreception.Bythemid-1960s,however,operatingsystemengineershadbecomeskepticalofvirtualmemory.Itsperformancewasuntrustworthyanditwaspronetounpredictablethrashing.In1965,IjoinedProjectMACatMIT,whichwasdevelopingMultics.ThedesignersofMulticsdidnotwanttheirvirtualmemorytosuccumbtotheseproblems.MyPhDresearchprojectwastofindanewapproachtomanagingvirtualmemorythatwouldmakeitsperformancetrustworthy,reliable,andnear-optimal.Thenewapproachhadtwoparts.ThefirstwastheWorkingSetModel,whichwasbasedontheideaofmeasuringtheintrinsicmemorydemandsofindividualprograms;workingsetsthendeterminedhowmanymainmemoryslotswereneededandwhatpagesshouldbeloadedintothem.ThesecondpartwasthePrincipleofLocality,whichwasthestrongtendencyofexecutingprogramstoconfinetheirreferencestolimitedlocalitysetsoverextendedphases.Afterextensiveexperimentalverification,thelocalityprinciplewasacceptedasauniversallawofcomputing.Thelocalityprinciplemadeitpossibletoprovethatworking-setmemorymanagementisnear-optimalandimmunetothrashing.ThesediscoveriesenabledthesuccessofvirtualmemoryonMulticsandoncommercialoperatingsystems.

Sincethe1970s,everyoperatingsystemtextbookhasdiscussedtheworkingsetmodel.Further,everymodernoperatingsystemusestheworkingsetmodelasanidealtounderpinitsapproachtomemorymanagement[den16].Thiscanbeseen,forexample,bylookingattheprocessactivitycontrolpanelsinWindows,whereyouwillsee“workingset”mentionedinthememoryusecolumn.AsearchoftheUSpatentdatabaseshowsover12,000patentsthatbasetheirclaimsontheworkingsetorlocalitytheories.

Inthistutorial,Iwilltracehowallthiscametobeandwillshowyouthepowerfulalgorithmswedevelopedtocomputeworkingsetstatisticsandusethemtodesignmemorysystems.

TheGrowingPainsofVirtualMemoryIn1959TomKilburnandhisteam,whowerebuildingtheAtlascomputerand

operatingsystemattheUniversityofManchester,UK,inventedthefirstvirtualmemory[fot61].Theyputitintooperationin1962[kil62].Virtualmemorymanagedthecontentofthesmallmainmemorybyautomaticallytransferringpagesbetweenitandthesecondarymemory.Kilburnargueditwouldimproveprogrammerproductivitybyautomatingthelabor-intensiveworkofplanningmemorytransfers.Itwasimmediatelyseenbymanyasaningenioussolutiontothecomplexproblemsofmanaginginformationflowsinthememoryhierarchy.

TheManchesterdesignersintroducedfourinnovationsthatsoonwereadoptedasstandardsincomputerarchitectureandoperatingsystemsfromthentothepresentday.Onewasthepage,afixedsizedunitofstorageandtransfer.Programs

Page 3: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 3

anddataweredividedintopagesandstoredinmainmemoryslotscalledpageframeswithcopiesonthesecondarymemory.Asecondinnovationwasthedistinctionbetweenaddresses(namesofvalues)andlocations(memoryslotsholdingvalues).Theaddressspacewasalargelinearsequenceofaddressesandthemainmemory(RAM)apoolofpageframes,eachofwhichcouldholdanypagefromtheaddressspace.AstheCPUgeneratedaddressesintoitsaddressspace,ahardwarememorymappingunittranslatedaddressestolocations,usingapagetablethatassociatedpageswithpageframes.Athirdinnovationwasthepagefault,aninterrupttriggeredintheoperatingsystemwhenanexecutingprogrampresentedthemappingunitwithanaddresswhosepagewasnotinmainmemory;theoperatingsystemlocatedthemissingpageinsecondarymemory,chosealoadedpagetoevict,andtransferredthemissingpageintothevacatedpageframe.Afourthinnovationwasthepagereplacementpolicy,thealgorithmthatchoseswhichpagemustbeevictedfrommainmemoryandreturnedtosecondarymemorytomakewayforanincomingpage.Themissratefunction–fractionofreferencesthatproduceapagefault–wasthekeyperformancemeasureforvirtualmemories.

Performanceofoperatingsystemshasalwaysbeenabigdeal.Tobeacceptedintotheoperatingsystem,avirtualmemorysystemhadtobeefficient.Thetwopotentialsourcesofinefficiencywereinaddressmappingandpagetransfers.TheycanbecalledtheAddressingProblemandtheReplacementProblem.

TheAddressingProblemyieldedquicklytoefficientsolutions.Atranslationlookasidebuffer,whichwasasmallcacheinthememorymappingunit,limitedtheslowdownfromaddressmappingto3%ofthenormalRAMaccesstime.Thiswasanacceptablecostforallthebenefits.1Virtualmemorydidindeeddoubleortripleprogrammerproductivity.Italsoeliminatedtheannoyingproblemthathand-craftedtransferscheduleshadtoberedoneforeachdifferentsizeofmainmemory.Evenmore,virtualmemoryenabledthepartitioningofmainmemoryintodisjointsets,oneforeachaddressspace,supportingaprimalobjectivetopreventexecutingjobsfrominterferingwithoneanother.Thiswasallgoodnewstothedesignersandusersofearlyoperatingsystems.

TheReplacementProblemwasmuchmoredifficultandcontroversial.Pagetransferswereverycostlybecausethespeedgapbetweenmainandsecondarymemorywas10,000ormore.Itwascriticaltofindreplacementpoliciesthatminimizedpagetransfers.Earlytestsbroughtgoodnews:thepagetransferschedulesgeneratedautomaticallybythevirtualmemoryusuallygeneratedfewerpagemovesthanhand-craftedschedules[say69].Becausethemainmemorywasveryexpensiveandsmall,2eventhecleverestprogramswerelikelytogeneratealotofpagefaults.Despitethegreatpressureonthemtofindgoodreplacementpolicies,

1Theterm“job”isusedthroughoutthistutorialtomeananyofprocess,thread,orexecutingprogram.Itdenotesacomputationaltaskdoingworkforauserorthesystem.2Mainmemorywasveryexpensive,atleast$0.25abyte;todayagigabyte(GB)costs$5.00,abouttwomillionthsofthat.Eventhoughmemoryaccesstimeshavebecomemuchsmaller,thespeedgapbetweenthemainmemory(RAM)andsecondarymemory(usuallyDISK)hasrisenfrom104in1960to106today.Pagefaultshaveneverbeencheap.

Page 4: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 4

designersofvirtualmemoriesfoundnoclearwinners;by1965therewasconsiderableheateddebateandconflictingdataaboutwhichreplacementpolicywouldbethebest.Manyengineersbegantoharbordoubtsaboutwhethervirtualmemorycouldbecountedontogivegoodperformancetoeveryoneusingacomputersystem[ran68].

TheAtlasdesigners,keenlyawareofthehighcostofpaging,inventedaningeniousreplacementpolicytheycalledthe“learningalgorithm”.Itassumedthattypicalpagesspentthemajorityoftheirtimebeingvisitedinwell-definedloops.Itmeasuredtheperiodofeachpage’sloopandchoseforreplacementthepagenotexpectedtobeaccessedagainforthelongesttimeintothefuture.ThelearningalgorithmemployedtheoptimalityprinciplethatIBM’sLesBeladymadepreciseinhisMINpagingalgorithmin1966[bel66].Unfortunately,thelearningalgorithmdidnotworkwellforprogramsthatdidnothavetight,well-definedloops;itdidnotfindfavoramongtheengineerssearchingforrobustpagingalgorithms.

Theinnovationofmultiprogramminginthemid1960saddedtotheconsternationaboutvirtualmemoryperformance.Withmultiprogramming,severaljobscouldbeloadedsimultaneouslyintoseparatepartitionsofthemainmemory,yieldingsignificantimprovementsofCPUefficiencyandsystemthroughput.Butmultiprogrammedvirtualmemoriesbroughtahostofnewproblems.Howmanyprogramsshouldbeloaded?Howmuchmemoryspaceshouldeachprogramget?Shouldspaceallocationsbeallowedtovary?Intheprocessoftryingtoanswerthesequestions,engineersdiscoveredthatvirtualmemorysystemswerepronetoanew,unexpected,andveryseriousproblem:thrashing.

ThrashingisthesuddencollapseofCPUefficiencyandsystemthroughputwhentoomanyprogramsareloadedintomainmemoryatonce.Itisacatastrophicinstabilitytriggeredwhenthepagingpolicystealspagesfromotherprogramstosatisfypagefaults.Thiscondition,originallycalled“pagingtodeath”,couldbetriggeredbyaddingjustonemoreprogramtothemainmemory.Thetriggerthresholdwasunpredictable.Whowouldpurchaseamillion-dollarcomputersystemwhoseperformancecouldsuddenlycollapseatrandomandunpredictabletimes?

Thus,by1965,operatingsystemsdesignerswerefacinganenormouschallenge.Couldtheydesignpoliciesformanagingvirtualmemorythatminimizedpagefaultsanddidnotthrash?

Thegoodnewsisaffirmative:theworkingsetpolicybecameaclassicidealformanagingmemory[den16].Allthemajormodernoperatingsystems–todayincludingWindows,MacOS,andLinux–usedmemorymanagementpoliciesinspiredbytheworkingsetmodel.

IntotheStormIjoinedMITProjectMACatthestartoftheMulticsprojectin1965.Multics

plannedtohaveamultiprogrammedvirtualmemory.Thedesignersweredeeply

Page 5: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 5

worriedthattheycouldwinduplikesomeofthecommercialoperatingsystemsthatwereadoptingmultiprogrammedvirtualmemory–hobbledbyexcessivepagingandsusceptibletothrashing.IbecamefascinatedwiththesequestionsandtookthemonformyPhDwork.JerrySaltzerposedtheresearchquestioninaniceway:consideringMulticsasablackbox,canyoudesignanautomaticcontrolmechanismforthevirtualmemorywithasingle,tunable,optimizingparameter?[den80]

Isetouttofindanswersfortheskepticswhoseriouslydoubtedthatvirtualmemorycouldbestabilized.Theirprimaryconcernwasthatnooneofavarietyofpagereplacementpoliciesworkedconsistentlywell.Theyalsoknewthatnoreplacementpolicyworkswellforsomecommonproblems.Forexample,matrixmultiplication,amainstayofgraphicsandneuralnetworks,isveryfastifthematricesareallinmemory.Butiftheyarestoredwithrowsorcolumnsonseparatepagesandmemorycannotholdthemall,matrixmultiplicationgeneratesenormousamountsofpagingandgrindstoanearhalt.

In1966,LesBeladyofIBMWatsonResearchLabpublishedafamousexperimentalstudycomparingalargenumberofreplacementalgorithms[bel66].Oneofhisfindingswasthatreplacementpoliciesrelyingonusebitstodecidewhichpagestokeepinmainmemoryperformedbetterthanothers.3Hetookthisasevidenceofa“localityproperty”–processestendtoreusepagesmostrecentlyusedinthepast.Inlate1965Ihadindependentlyreachedasimilarconclusion,whichIproposedtoharnesswitha“workingset”,definedasthepagesusedduringabackward-lookingsamplingwindowofvirtualtime.4Theworkingsetwouldseethemostrecentlyusedpagesandtheoperatingsystemwouldprotectthemfrombeingpagedout[den68a,den72].Theoperatingsystemcouldpreventthrashingbyneverallowingapagefaulttostealapagefromanotherworkingset[den68b].BeladyandIbeganacollaborationin1967,inwhichwepostulatedthatallprogramsobeya“localityprinciple”thatcouldbeusefullyexploitedbypagingalgorithms.

PageReferenceMapsPagereferencemapswereausefultoolinearlyvirtualmemoryresearch.Amap

showstimeonthex-axisandpagesonthey-axis.Eachpointofthetimeaxisrepresentsasampleintervalandaboveitisacolumnofdarkenedpixelsmarkingthepagesareusedinthatsampleinterval.(AnexampleappearsinFigure1.)ThesamplingintervalisTtimeunits,whereonetimeunitisasinglepagereference.Thepagesusedinasamplingintervalarethelocalitysetofthatinterval.Aphaseisasequenceofsamplingintervalsoverwhichthelocalitysetisunchanged.Thesemapsclearlyshowedthatjobsaccessedsmallsubsetsoftheirtotaladdressspaces

3Ausebitisahardwarebitassociatedwitheachpageframe.Whenthepageisaccessed(readorwritten)theusebitissetto1bytheaddressinghardware.Theoperatingsystemcanscanforusedpagesandresetthebitsto0.Unusedpagesareconsideredinactiveandarefirsttoberemovedfrommainmemorywhenspacemustbefreedup.4Virtualtimeisdiscretetimemeasuredasnumberofmemoryaccesses;itisnotinterruptedbyexternaleventssuchaspagefaultsorotherinput-output.

Page 6: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 6

forextendedperiods.Noprogramwithrandomreferencemapwaseverobserved.ThesemapswerestrongevidenceofauniversalLocalityPrincipleexhibitedbyallexecutingprograms.

WhenheundertookhisresearchonpaginginLinuxaround2009,ahalfcenturyaftertheinventionofvirtualmemory,AdrianMcMenaminencounteredalotofskepticismamongLinuxsystemprogrammersaboutlocality[mcm11].Theyregardeditasanobsoleteideafromtheearlydaysofvirtualmemory,nolongerrelevantbecausemoderncomputershavesomuchmorememory.Tothecontrary,McMenaminfoundoutthatLinuxprogramsdisplaylocality–anditisevenmorepronouncedthaninearlyvirtualmemories.Whywouldtheremovalofmemoryconstraintsleadtomorepronouncedlocality?Thereasonappearstobethatunconstrainedprogrammersbuiltmoremodularprograms:thephasesareintervalsofaparticularmodule’suse.

McMenamindemonstratedlocalitybyrecordingthepagereferencemapsfromasignificantsampleofLinuxprograms.Everyprogramhadclearlyidentifiablelocalitysetsandphases–auniquelocality-phase“signature”.5Heconcludedthattheskepticismwasmisplacedandthatconsiderableperformancebenefitswillcometosystemsthatexploitthelocalitybehavior.Pagereferencemapsremainapowerfultoolforvisualizinglocalityandtheoperationofmemorymanagementpolicies.

5Theonlyknownexceptionisadatascanner,aprogramthatexamineseachiteminadatasequencejustonceandthendiscardsit.Thebestreplacementpolicyinthiscaseevictsapagejustafteritisused.

Page 7: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 7

Figure1.ThisisapagereferencemapoftheFirefoxWebbrowserinaLinuxsystem.Thehorizontalaxisrepresentsvirtualtime,dividedintoequalsamplewindowsofabout380Kreferences,andtheverticalrepresentsvirtualaddressesofpages.Acoloredpixelindicatesthatthepagewasreferencedduringtheassociatedsamplewindow;awhitepixelindicatesnoreference.Theverticalgridlinesarespaced200sampleintervalsapartandthehorizontalgridlinesarespaced50pagesapart.Themaprevealsthelocalitysetsoftheprogramandshowsdramaticallythatlocalitysetsarestableoverextendedperiods(phases),punctuatedbysharpshiftstootherlocalitysets.Forover99percentofthetimeinthismap,thepagesseeninasampleintervalareanearperfectpredictorforthepagesusedinthenextsampleinterval.Mostexecutingprogramshavestrikinglocalitymapslikethisone.Eachhasitsownuniquelocalitybehavior,likeadigitalsignature.Thereisnorandomnessinthewayprogramsusetheircodeanddata.(Source:AdrianMcMenamin[mcm11],CreativeCommonslicense.)

Page 8: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 8

OriginsofLocalityandWorkingSetsTheideasoflocalityandworkingsets(WS)areseparateanddistinctbutare

intimatelyconnected.Localityisaboutthepatternsofprogramsreferencingtheirpages.Workingsetisaboutdetectingthosepatternsinrealtimeandusingthemtomakememorymanagementdecisions.

LesBelady’sfamousstudyofpagingalgorithmsin1966demonstratedthatLRU(leastrecentlyused)replacementconsistentlyproducedfewerpagefaultsthanFIFO(firstinfirstout)replacement[bel66].Beladyreasonedthat,ifprogramslocalizetheirreferencesintosubsetsoftheirpages,themostrecentlyusedpagesweremostlikelytobereusedintheimmediatefuture–andthustheleastrecentlyusedpageswerethebestchoicesforreplacement.Beladyalsonotedthataweakerformoflocality–nonuniformuseofpages–explainedwhythefaultratefunctionsofLRUandFIFOwerenotlinear.

In1966therewasaconsensusamongoperatingsystemengineersthatlocalitywasanobservedtendencyforprogramstoreusetheirpagesintheimmediatefuture.Isawaconnectionbetweenthisinformalideaoflocalityandanoldprogrammingconcept.Programmersusedtheterm“workingset”tomeanthepagesthatneededtobeloadedinmainmemorysothataprogramwouldexecuteefficiently.Itwasuptotheprogrammertodeclaretheworkingsetsanddesignascheduleofpagetransferstoensurethatworkingsetswereloadedinmainmemory.Itseemedtomethattheoperatingsystemcoulddetectworkingsetsbymonitoringusebits,enablingittoautomaticallyloadworkingsetsinmemorywithouthavingtoaskprogrammerstodeclarethem.Pageswhoseusebitsweresetduringawindowoffixedsizewouldestimatetheprogram’sworkingset[den68a].6Thusthetwoideas,localityandworkingset,becameapowerfulpartnershipforallocatingmemory.

Let’sexaminelocalitymoreclosely.Frompagereferencemaps,earlyvirtualmemoryresearcherssawthatprogramsusedonlyasmallsubsetoftheirpagesatanygiventime.Thepagesthatwereusedtogetherwereseenas“spatiallyclustered”becausetheuseofoneimpliedthatanotherwouldbeusedsoon.Forexample,thepagesofalooporthepagesofacodemodulearespatiallyclustered.Spatialclusteringimpliedtemporalclustering.Thesetwotermsbecamefavoritewaysofexplainingwhylocalitywasacharacteristicofprogramexecution.

Inthedecadeafter1968,mystudentsandI,alongwithotherresearchers,studiedhowlocalityismanifestedinactualprograms.Westudiedandmeasuredmanyprograms,leadingustointroducetheterms“localitysets”,“phases”,and“transitions”.Thesetermscapturedtherecurringstructureoflocality–periodsofstabilitypunctuatedbyabrupttransitions.

6AlthoughWSandLRUfavormost-recently-usedpages,theyarenotthesame.LRUoperateswithinafixedmemoryspacebutdoesnotadvisetheoperatingsystemwhatsizeitshouldbe.WSmeasuresthelocalitysetandadvisestheoperatingsystemtoallocatejusttherightamountofmemoryrequiredtoholdit.TheWScontentsarethemostrecentlyusedpagesinthewindow,buttherethesimilaritywithLRUends.

Page 9: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 9

Weconcludedthattemporalclusteringisaratherpoordescriptionofthepunctuatedstabilityobservedinexecutingprograms.Temporalclusteringexemplifies“slowdrift”butnot“suddenchange”.Toconfirmthis,webuiltandstudiedmathematicalmodelsoflocality.MystudentJeffSpirntestedthe“slowdrift”hypothesisbyformulatingaseriesofmathematicalmodelsofslowdriftbehaviorandthenexamininghowwelleachmodelpredictedtheobservedLRUmiss-ratefunctionofaprogram[spi72].Spirnfoundthatsimplemodelssuchasindependentreferencemodel(eachpageisreferencedwithafixedprobability)andindependentstackdistancemodel(eachLRUstackdistanceoccurswithafixedprobability)ledtopoorpredictionsofLRUmissrate.

MystudentKevinKahnmodeledthepunctuatedstabilitybehaviordirectly[kah76].Inhismodels,stateswerelocalitysetsandphaseswereholdingtimesinthestates.Phase-transitionmodelsparameterizedfrommeasurementsofpagereferencemapsyieldedexcellentagreementbetweenpredictedandactualLRUmissrates.Moreover,becauseitadjustedtolocalitysets,aWSpolicygeneratedlesspagingthanLRUwithoutusingmorememory.

WayneMadisonandAlanBatsonconfirmedthatthesekeyaspectsoflocality–localitysets,phases,andtransitions–existatthesourcecodelevel[mad76].Theyconcludedthatlocalityisnotanartifactofthewaythatcompilerslayoutdataandcodeblocksonpagesofaddressspace.Designtechniquessuchasloopiteration,divide-and-conquer,andmodularityleadtosubsetsofpagesbeingusedforextendedperiods.Thelocalityseeninpagereferencesistheimageofthehigher-levellocalitycreatedasprogrammersdesigntheiralgorithms.

Programmerswhounderstoodthatvirtualmemoryperformsbetterwithprogramsofgoodlocalityeasilydesignedprogramsthatranwellinvirtualmemory[Say69].Everyprogramweeversawexhibitedlocality.Noprogramuseditspagesrandomly.Bythemid1970s,wehadsettledonthedefinitionoflocalityintheaccompanyingbox[denn80].

TheLocalityPrincipleExecutingprocessesreferencetheirmemoryobjectswithpunctuatedstabilitydescribedby:

(𝐿!, 𝐻!), (𝐿", 𝐻"), … , (𝐿# , 𝐻#), …

whereLiisalocalitysetandHiistheholdingtimeofitsphase.Theshortestsamplingintervalrequiredtoseethefulllocalitysetislikelytobeasmallfractionofholdingtime.Successivelocalitysetsarelikelytobemostlydifferent,withfewoverlaps.

Page 10: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 10

RecentresearchbyChenDingandhisstudentswithlargesharedcacheshasdemonstratedthatlocalityisobservedincachereferencepatterns.Consequently,workingsetmemorymanagementalsoappliestosharedcaches,justasinmultiprogrammedoperatingsystems[xia11,xia13,xia18].

Ihavefoundthatstudentstodayoftenhavetroubleunderstandingtheprincipleoflocality.ManyarelikeMcMenaminbeforehisstudy:itseemscounterintuitivethatprogramshavesuchpronouncedlocalitybehavior,orthattrackinglocalityoptimizesperformance.Thismisunderstandingmayberootedintheirownexperienceofprogramming:theywerenotconcernedaboutmemoryconstraintsanddidnotconsciouslydesigntheirprogramstohavelongphasesofstablereferencestoobjects.Yettheexperimentalevidencerepeatedlyshowsthatwhentheirprogramisembeddedasamoduleintoalargersoftwaresystem,thewholesystemdisplaysthephase-transitionbehavioroflocality.

Thismisunderstandingissupportedbythelimiteddefinitionsoflocalityinoperatingsystemstextbooks.Thebooksusuallydefinelocalityasacombinationoftemporalandspatialclosenessofreferences.Thesedefinitionsignoretheempiricalfactofabruptchangesatphasetransitions.Noticethatrandomassignmentofdatatopagesmightremovespatiallocalitybutitwillnotremovetemporallocality.Thesamephasesandtransitionswillbeobservedonthepagereferencemap.

LetusillustratewithFigure1howwecanmeasuresomepropertiesoflocalitybehaviorfromapagereferencemap.ThefigureshowsfivelocalitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Ktimeunits.AmeasurementofthegraphyieldsTable1,wheresizeisthenumberofpagesinthelocalityset,lengthisthenumberofsampleintervalsinaphase,andfractionisthepercentageofthefulladdressspacecoveredbythelocalityset.CachepoliciesthatdetectlocalitysetsinFigure1wouldneedonlymemorysufficienttohold15-33%oftheaddressspacetoachieve100%oftheperformance.

Table1

set size length fraction

1 50 180 25%

2 65 220 33%

3 30 220 15%

4 55 50 28%

5 35 180 18%

Page 11: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 11

Noticethatthepunctuated-stabilityideaoflocalityeasilytranslatesintomeasurementsoflocalitysetsizes,phaselengths,andtransitionprobabilitiesonpagereferencemaps,whereasthevaguetermstemporalandspatiallocalitydonotsuggestmeasurementprotocols.

Pagereferencemaps,whichwereausefultoolintheearlystudiesofvirtualmemoryperformance,continuetobeusefultodayinstudiesofcacheperformance.Theyarestrikingevidenceoflocalityandvisuallyconveyconsiderableinsightintothedynamicsofexecutingprograms.

Likeotherscientificdiscoveries,theLocalityPrinciplebeganasahypothesisandwasacceptedasscientificfactonlyaftermanyvalidations.Theoriginalnotionof“slowdrift”temporallocalitygavewaytoamoresophisticatednotionof“punctuatedstability”characterizedbyphasesandtransitions.Workingsetisareal-timedetectorofthisbehavior.Itallowsavirtualmemorymanagertodynamicallyadjustmemoryallocationsandmaximizesystemthroughput.

MemorySpace-TimeLawOperatingsystemdesignershavealwaysbeenconcernedwithperformance.

Theywouldliketostateperformanceguaranteesforthroughputandresponsetimethatwillholdoverawiderangeofworkloadsandnumbersofsimultaneoususers.Oneofthemostchallengingquestionswashowtoestablishaconnectionbetweenmemorymanagementandthekeyperformancemeasureofthroughput.Whencanwesaythatoptimizingmemorymanagementoptimizessystemthroughput?

Whenjobsuselessspace,wecangetmoreofthemintomemoryandincreasethroughputbecauseoftheparallelism.Andwhentheycompleteinlesstime,wecanprocessmoreofthemovertime.Thisiswhymanyofushaveanintuitionthatthesmallerthespace-timefootprintofprograms,thelargeristhesystemthroughput.

Thespacetimefootprintisthenumberofpage-secondsofmemoryusageofanexecutingjob.Apage-secondisaunitofrentforusingmemory.Itisanalogoustotheideaofchargingforofficespacebythenumberofsquare-feetrented,orcharginglaboronaprojectbyperson-hours.Whenaprocessloads1pageintomainmemoryforSseconds,theprocessaddsSpage-secondstoitsmemorybill.LoadingSpagesfor1secondalsoaddsSpage-secondstothebill.

JeffBuzenin1976discoveredthe“memoryspace-timelaw”(MSTLaw),anexactformulathatlinksspace-timeandsystemthroughput[buz76].Itsays“averagetotalmemoryusedbyalljobs=systemthroughput´averagespace-timefootprintperjob”.Insymbols,

𝑀 = 𝑋𝑌

Theproofissimple.MeasurethesysteminanintervaloflengthT.LetZdenotethetotalspace-timeusedbyallthejobsinthatinterval.(Forfixedmemory,Z=MT.)Thethroughputisthenumberofjobscompletedinthatinterval(C)dividedbythe

Page 12: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 12

lengthoftheinterval:X=C/T.Themeanspace-timefootprintofajobisY=Z/C.TheproductofXandYisobviouslyM=Z/T.7

TheMSTLawisgeneralandappliestoanysystemornetworkthathasmemoryusageandthroughput.Somepeoplefindithardtobelievethattherelationbetweenmemoryusageandthroughputisthissimple.Itreallyis.

Theobviousconclusionisthatapolicythatminimizesspace-timewillmaximizethroughput.Tomakeiteasytotellwhenthisishappening,wewill,intheanalyticstofollow,usespace-timetomeasurethememoryusageofworkingsetsandothermemorypolicies.Ifneeded,wecaneasilyconvertspace-timemeasurestotimeaveragesbydividingspace-timebythelengthoftimeofthemeasurement.

Thereisacomplication.Thespace-timemeasuresofmemorypoliciesaredefinedinvirtualtime.Thespace-timeneededfortheMSTLawisdefinedinrealtime.Weneedawaytoconvertvirtualspace-timetorealspace-time.

Themostaccuratewaytodothisiswiththehelpofaqueueingnetworkmodel[den78b].Themodelwouldaccountforalldelaysbeyondvirtualtime,suchasinput-outputandpagetransfers.Settingupsuchamodelisbeyondthescopeofthistutorial.

However,agoodapproximationcanbemadesimplybyaugmentingvirtualspace-timewiththespace-timeaccumulatedwhileservicingpagefaults.Todothis,werequirethesequantities:

D=pagefaultdelay,typically106memoryaccesses

N=lengthofvirtualtimeaprocessisexecutedS=virtualspace-timeaccumulatedbytheprocessduringexecution

C=numberofpagefaultsaccumulatedduringexecutionm=missrate,C/N

Thespace-timecostofonepagefaultisthemeanspaceS/NtimesthedelayD.Forallpagefaultsitis(S/N)(D)(C)=(S)(D)(C/N)=SDm.Thereforethetotalrealspace-timeisestimatedasS+SDm,orinthenotationoftheMSRLaw,

𝑌 = 𝑆(1 + 𝐷𝑚)

Thustherealspace-timeisapproximatelythevirtualspacetimedilatedbythefactor1+Dm.Forexample,ifthemissrateis10-4(1faultin104memoryaccesses),thedilationfactoris101.

PoliciessuchasLRUorFIFOworkwithafixednumberkofpages;theirtotalvirtualspace-timeiskN.Thus,Yissmalleronlywhenmissmallerandthepolicywithlowestmissratewillhavehighestsystemthroughput.Butthereismoretothe

7Thememoryspace-timelawcanbeseenasaninstanceofLittle’slaw.Little’slawsaysthatthemeannumberinsystemistheproductofthethroughputandthesystemresponsetime.WecaninterpretMasthemeannumberofpagesinthesystem;Xasthethroughput;andYastheaggregateholdingtimeaccumulatedbyajobforallitspages.

Page 13: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 13

story.Oncethemissrateasafunctionofkisknown,thereisavalueofkthatminimizestheexpressionforY[den80].Evenfortheoptimalpolicy,thereisabestmemorysizethatminimizesthespace-time.

Whenwediscussthevariable-spacepolicyworkingset,wewillseethatminimizingvirtualspace-timemaximizessystemthroughput.

TwoContrastingViewsofMemoryManagementThefamiliarwayoflookingatthememoryseesasingleCPU(representingajob

inexecution)accessingpagesinafixed-sizememory.NootherCPUormemoryregionisvisible.ThisiscalledtheFixedMemoryView(FMV).(SeeFigure2.)Inthisview,thejobgetsafixedspaceinmainmemoryandisunaffectedbythepresenceofotherjobsinthememory.Theperformanceofareplacementpolicyisthencompletelydeterminedbyitstotalfaultcountfunction.

Figure2.TheFixedMemoryView(FMV)seesthesystemasasingleCPUaccessingasingleRAM(mainmemory)offixedsizekpages.AddressmappinghitsaretranslateddirectlytoRAMaddresses.AddressmappingmissestriggerpagefaultsthatcauseupanddownpagemovesbetweenRAMandDISK(secondarymemory).TheobjectiveistoefficientlycomputethefaultfunctionF(k),whichcountsthenumberofpagefaultswhenmemorysizeiskpages,andthenselectreplacementpoliciesthatminimizeF(k).Inasystemwithmultiprogramming,theRAMvisibleinthisviewisafixedregionofthefullRAM.

Page 14: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 14

Extensiveexperimentalstudiesandexperiencewithoperatingsystemsleddesignerstofavortwobasicreplacementpolicies.FIFO(firstinfirstout)treatsthememoryspaceofkpagesasaFIFOqueueandignorespageusage.Itsprimaryattractionissimplicityandalmostnegligibleoverhead.LRU(leastrecentlyused)treatsthememoryasacontainerofthekmostrecentlyusedpages;atapagefault,itreplacesthepageinmemorythathasnotbeenusedforthelongesttime.AlthoughLRUgeneratesfewerpagefaultsthanFIFO,LRUhasahighimplementationoverhead.ManyanalystsbelievedthatthesavingsofLRUarecancelledbyitsoverhead.

AnelegantcompromisebetweenFIFOandLRUiscalledCLOCK.IttreatstheFIFOqueueasacircularlistofsizekwithascanningpointer;thepagenamesareanalogoustothenumeralsonrimofaclockandthepointertotheclock’shand.Atapagefault,theoperatingsystemmovesthehandalongthelist,skippingoverthosewithusebitson(andresettingthem).Whenitfindsapagewithusebitoff,itselectsthatpageforreplacement.CLOCKhasoverheadcomparabletoFIFOandperformancecomparabletoLRU.CLOCKiscommonlyusedinoperatingsystems.(Inearlyvirtualmemorysystems,CLOCKwascalledFINUFO,forfirst-in-not-used-first-out[den68a].)

In1970,RichardMattsonandhisIBMcolleaguesdiscoveredahighlyefficientwaytorepresentalargeclassofpagingalgorithmsandcomputetheirfaultfunctions.Theycalledtheirtheory“stackalgorithms”[mat70].Astackalgorithmisapagingpolicywhosememorycontentscanberepresentedwithasinglelistofallthejob’spagescalledthestack,suchthatthecontentsofk-pagememoryarethefirstkelementsofthestack.LRU’sstacklistsallthejob’spagesfrommosttoleastrecentlyused;thecontentsofthek-pageLRUmemoryarethefirstkpagesinthestack.Ateachpagereference,LRU’sstackisupdatedbymovingthereferencedpagetothetopandpushingtheinterveningpagesdownoneposition.Forageneralstackalgorithm,muchthesamehappens:thereferencedpagemovestothetopandtheinterveningpagesarerearrangeddownwardaccordingtotheirrelativeprioritiesassignedbythepagingpolicy.Thepositionofthenextreferenceinthestackiscalledstackdistance.ThefaultfunctionF(k)canbecomputedsimplyasthenumberofstackdistanceslargerthank.Thiseleganttheoryisfrequentlydiscussedinoperatingsystemstextbooks.

Thestacktheorydidnotanswertwoimportantquestions:Whatistheoptimalamountofmemorytoallocate?HowissystemthroughputrelatedtothefaultfunctionF(k)?Thesequestionswereansweredbyothermodelingtechniques;detailsarein[den80].

Unfortunately,theFMVanditsanalytictheoryisnotveryhelpfulforrealoperatingsystems,whichallowmultiplejobstoresideinRAMatthesametime.TheFMVgivesnoinsightintoimportantquestionsincluding:

• HowtopartitiontheRAMamongthejobs?

Page 15: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 15

• Howmuchspacetogiveeachjob?

• Howtomanagevariationsinthespaceallocations?

• Howtohandlepagereplacementtomaximizesystemthroughput?

• Howtomanageinteractionsamongjobssuchasapagefaultinonestealingapagefromanother?

• Howtopreventthrashing,acollapseofsystemthroughputwhentoomanyjobsareloadedintoRAM?

Weobviouslyneedadifferentwayofthinkingaboutthecommonsituationofmultiprogramming.Itiscalledthesharedmemoryview(SMV).(SeeFigure3.)

Figure3.Thesharedmemoryview(SMV)seesthesystemasasetofCPUs(oneforeachjob)sharingtheM-pageRAM.ThememoryispartitionedamongNjobs,eachgettingitsownsetofpageframesdisjointfromalltheothers.Apagefaultmaytriggerthepagereplacementalgorithmtostealapagefromanotherjob,therebyincreasingthespaceoccupiedbythefaulteranddecreasingthespaceoccupiedbythevictim.

Page 16: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 16

Becauseofallthefactorsinvolved,theoptimalmanagementofsharedmemorycannotbeinferredfromfixedmemory.Obviousextensionsofthefixedmemoryviewleadtopoorperformanceandinstability.Forexample,manyoperatingsystemssimplyextendedthefixedmemoryviewtoincludeallofRAM.TheresultingglobalLRUwouldreplacetheoldestunusedpageinRAMregardlessofwhichjobitbelongsto.Unfortunately,thegloballistofpagesdoesnotreflectactualrecencyofusewithinjobs.Itisorderedmainlybytheround-robinschedulerofthereadylist:thepagesatoptheglobalLRUstackbelongtothejobthatmostrecentlyreceivedatimeslice.Ifpagingactivityishigh,bythetimeajobcyclesbacktothefrontofthereadylistsomeofitspageshavebeenremoved.Thecascadingeffectcauseshighpagingineveryjob,pushingwholesystemintoastateof“pagingtodeath”,inwhicheveryjobspendsmostofitstimequeuedattheDISKandCPUthroughputcollapses.Thisconditioniscalledthrashing.(SeeFigure4.)

Figure4.ThrashingisthecollapseofsystemthroughputwhentoomanyjobsareloadedintoRAMatonce.Itisachaoticconditionwhoseonsetcannotbepredictedaccurately–thetriggerthresholdN*isunpredictableandverysensitive.Loadingoneadditionaljobcantriggerthrashing.Thiscanhappeninanysharedmemory,suchascache,notjustRAM.

Page 17: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 17

Weneedadifferentwayofthinkingaboutmanagingsharedmemory.Theworkingsetmodelprovidesthebasis.Weneedtoabandonthefixedmemoryviewandinsteadusetheworkingsetinterpretationofsharedmemoryviewtoseewhatisgoingon.Theremainderofthispaperwillapproachthisintwostages.First,wewilldefineanalyticmethodsforcomputingworkingsetstatistics,suchasmissrate,fromagivenaddresstrace.Thesemethodsdonotdependonlocalityoranyotherassumptionsaboutprogrambehavior.Second,wewillshowthattheworkingsetpolicyisclosetooptimalforprogramswhoseaddresstracesconformtotheprincipleoflocality.Inthatcase,theworkingsetspace-timeisveryclosetotheoptimalspace-time.

WorkingSetsOriginallytheterm“workingset”wasaninformaltermmeaningthesmallest

setofajob’spagesthatneededtobeloadedinmainmemoryforefficientexecution.Theworkingsetmodelgaveaprecisedefinitionintermsofthepage-referencebehaviorofanexecutingjob.Theexecutingjobitselfinformsusofwhatmemoryitneeds,withoutregardforexternalfactorssuchasinterrupts.Thus,workingsetsareameasureoftheintrinsic,dynamically-varyingdemandofacomputationformemory[den68a,den72].

Theformaldefinitionisthattheworkingsetataparticulartimetisthepagesreferencedinabackward-lookingwindowofsizeTincludingtimet.ItisdenotedW(t,T).Thesize,w(t,T),isthenumberofdistinctpagesinW(t,T);thesizeisalwaysatleast1andneverlargerthanT.SizemaybeconsiderablysmallerthanTduetorepeatedreferencestothesamepageswiththewindow.Figure5illustrates.

Figure5.Thisexampleshowstheworkingsetofasimplejobattwodistincttimes.Weassumetimeisdiscreteandeachclocktickrepresentsasinglepagereference.Theseriesofnumbersabovethetimeline,calledanaddresstrace,isthesequenceofpagenumbersaccessedbyajob.Inthiscase,thereareaccessesattimest=1,2,…,15.ThewindowsizeisT=4.Thebackwardwindowatt=8contains4referencestothreedistinctpages;itssizeis3.Thebackwardwindowatt=15contains4referencestofourdistinctpages;itssizeis4.Weimaginethewindowslidingalongthetimeline,givingusadynamicallyvaryingseriesofworkingsets.

Page 18: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 18

Theworkingsetwindowcanbethoughtofasalease.AleaseisaguaranteetoholdapageinRAMforminimalperiodoftimeT.Apage’sleasecanbestoredinatimerregisterassociatedwithapageframe.WhenapageisloadedintoRAMorreusedwhileinRAM,itsleaseisresettoT.Itticksdownto0afterTtimeunitsofnon-use.Whentheleaserunsout,thepageisevictedfromtheworkingset.Theleasedefinitionisequivalenttothewindowdefinition[li19].

TheWorkingSetPolicyWorkingsetmemorymanagementpartitionsthemainmemoryinto

dynamicallychangingregions,onefortheworkingsetofeachjobusingthememory.Thebasicideaistomaintaineachjob’sworkingsetandnotallowanyotherprocesstostealpagesfromit[den68a].(SeeFigure6.)Thisisimplementedbymaintainingafreespaceinmemory,usuallysmallerthananyoftheworkingsets.Whenajobreferencesapagenotinitsworkingset(apagemissorfault),theoperatingsystemtransfersapagefromfreespacetothejob,increasingitsworkingsetsizebyone.Inparallel,whenapagetimesoutfromitsjob’swindowT,theoperatingsystemtransfersitbacktothefreespace,decreasingtheworkingsetbyone.Inthisregimepagefaultsandevictionsneednotcoincideastheydowhenmemoryallocationisfixed.

Figure6.Theboxrepresentsmemory,aRAMorcache,inusebyNexecutingjobs.Eachjob’sworkingsetisloadedinmemory.ThememorynotoccupiedbyworkingsetsiscalledFREE.Whenajobencountersapagemiss,thememoryslottoholdthenewpageistransferredfromFREEtothejob’sworkingset,thusenlargingthatworkingsetbyonepage.(IfFREEisempty,theincomingpagereplacestheLRUpageoftheworkingset.)WhenaworkingsetpageleaseTtimesout,thepageisevictedfromtheworkingsetandthe

Page 19: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 19

emptymemoryslotreturnedtoFREE,thusdecreasingthatworkingsetbyonepage.Whenajobquits,allitspagesareerasedandtheirmemoryslotsreturnedtoFREE.Aschedulermaintainsaqueueofjobswantingtousememory,admittingthenextoneonlyifFREEislargeenoughtoholditsworkingset.Thispolicypreventsanyjobfromstealingapagefromanotherduringapagefault,therebyprotectingthesystemfromthrashing.

AnimportantquestionishowtochoosethewindowsizeT?Whenwereturnto

thisquestionlater,wewillseethatwecanselectasingleglobalvalueofthewindowTforwhichworkingsetmanagementdeliversnear-optimalsystemthroughput.

Moreover,becausetheworkingsetpolicypreventsprocessesfromstealingpagesfromeachother,itisimmunetothrashing.

Theseaspects–simplicityofworkingsetmemorymanagement,optimalityofthroughput,andresistancetothrashing–makeworkingsetmemorymanagementtheidealofoperatingsystemsandcaches[den80].

Workingsetanalytics,discussednext,showsushowtoefficientlymeasureworkingsetstatisticstodeterminethememorycapacityofasystemandthethroughputlikelytobeobservedunderaworkingsetpolicy.Allthedataneededfortheworkingsetstatisticscanbemeasuredinasinglepassofanaddresstraceandcapturedasahistogramofreuseintervals.Eachstatisticisasimplelinearcomputationfromthosedata.Theanalyticsdonotdependonanylocalityorstochasticassumptionsabouthowjobsrefertotheirpages.

Traces,Reuses,ColdandWarmStartsInthissectionwewilldefinethebasicterminologyandnotationused

throughoutworkingsetanalytics.

Allthestatisticsaremeasuredindiscretevirtualtime,whichisthetimeofprogramexecution,onetickpermemoryaccess.Delaysforinput,output,ortime-slicingareignored.Measuringinvirtualtimeallowsustoseetheinherentmemorydemandofaprogramwithoutdistortionbyrandominterrupts.Whenneeded,wecanconvertthesemeasuresbackintorealtimebyinsertingdelayswheninterruptsoccur.

Anaddresstraceisarecordingofthesequenceofpagenumbersreferenced(accessed)byajobattimest=1,2,…,N;r(t)=imeansthattheprocessaccessed(used)pageiattimet.OSperformanceanalystsuseaddresstracesasinputstosimulatorsofmemorymanagementpoliciesintheOSorthecache.TracescanalsobeusedtobuildpagereferencemapssuchasFigure1.Theintervalsbetweensuccessivereferencestoapagearecalledreuseintervals.Figure7illustrates.

Page 20: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 20

Figure7.TheaddresstraceofFigure5isshownagainintoprow.Itspans15timeunits(N=15)and5pages(M=5).Thereuseintervalsappearjustbeloweachreference;areuseintervalisthetimesincepriorreference.Themark“x”indicatesfirstreferences,whichhavenoprioruse.Reuseintervalsareimportantbecausetheyindicatewhetherapageisintheworkingset.Forexample,attimet=6,page2isreusedwithinterval4;aworkingsetwithwindows3orlesswillgenerateapagemissatthattime.

Themisscount,mc(T),isthenumberofmisses–referencesnotintheworkingset.Missescausepagefaults.Weoccasionallyspeakofthe“missrate”,whichissimplythemisscountdividedbyN.Itissometimessuggestedthat,topreventthelargespeedgapof106betweenmainandsecondarymemoryfromruiningperformance,weshouldchooseTlargeenough,ifpossible,tokeepmissratesnohigherthan10-6.Aswewillsee,however,thisisnotthebestwayofchoosingT.

Thefirstreferencestopagearedifferentfromtheothersbecausetheyhavenoreuseintervals.Whethertheycauseinitialpagefaultsdependsonhowthememoryisinitialized.Therearetwopossibleinitializations.Thecoldstart(ornormal)initializationissimplyanemptymemory;thefirstreferencescausepagefaults.ThewarmstartinitializationcontainsallMpages;thefirstreferencescausenopagefaults.Theworkingsetcontentsthroughouttheaddresstracearethesameforcoldandwarmstart.Thedifferenceisthatwithcoldstarteveryfirstreferenceisamiss;withwarmstarteveryfirstreferenceisahit.Themisscountmc(T)isusedforcoldstartandanewcountmw(T)isusedforwarmstart;thedifferenceis

𝑚𝑐(𝑇) − 𝑚𝑤(𝑇) = 𝑀Thedistinctionbetweencoldandwarmstartsisusedinoperatingsystemsfor

thegeneralnotionofwhetherresumptionofasuspendedjoboccurswithemptymemoryorwiththepreviouscontentsofmemory.Coldstartsareslowbecauseallthedatamustbereloadedfromthedisksorothersources;warmstartsarefastbutrequirealotofinitialmemory.Theaddresstracemodelimplicitlyassumeswarmre-startsofprocessessuspendedbyinterrupts–thememorycontentsattimet+1arejustwhattheywereaftertimet,whetherornotaninterruptionoccurredbetweentandt+1.

SomeanalystsspeculatedthatvirtualmemoryperformancewouldimproveiftheOSachieveswarmstartbypreloadingpages.Ourformulas,however,showthat

Page 21: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 21

inthelongrunthereisverylittledifferencebetweeninitialcoldandwarmstartsofaddresstraces.

MeanWorkingSetSizeThedefinitionofmeanworkingsetsizeoveranaddresstraceis

𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇)

$

%&!

=𝑠𝑡(𝑇)𝑁

wherest(T)isthespace-timeoccupiedbyworkingsetsinthetrace;oneunitofspace-timemeansthatonepage(space)wasinmemoryforonetimeunit(time).Weseethat[den68a,den72,den78a]

• Whent<T,theT-windowextendsbackwardbeforethestartofthetrace;onlytheportionofthewindowcontainedinthetracecountstowardst(T).Thus,for1≤t<T,theworkingsetsizecanbewrittenw(t,t).

• Fort≥T,thereareN-T+1workingsetswithwindowTfortheremainderoftheaddresstrace.

Insomeoftheanalyticsdiscussedbelow,thisdistinctionisimportant,andwewillseparatetheoriginalsumintotwopartsfort<Tandt≥T.Wewillworkwithfourspace-timeandcountingmeasures:

st(T)=space-timeaccumulatedbyworkingsetsofwindowT mc(T)=cold-startmisscount,thenumberofmisseswithwindowT

mw(T)=warm-startmisscount,sameasmc(T)excludingMfirstreferences

mwh(T)=warm-start-hot-finishmisscount(definedbelow)EachmeasurecanbeconvertedtoatimeaveragebydividingbyN,thetrace

length.Forexample,theclassicalmeanworkingsetsizeandmissrateare:

𝑠(𝑇) = 𝑠𝑡(𝑇)𝑁

𝑚(𝑇) =𝑚𝑐(𝑇)𝑁

ColumnSums,RowSums,andRunsWecandepictdynamicmemoryusewithabit-matrixwithonecolumnforeach

referencefort=1,…,Nandonerowforeachpagei=1,…,M.Position(i,t)is1ifpageiisinworkingsetW(t,T)and0ifiisnotinW(t,T).Letuscallthismatrixtheworking

Page 22: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 22

setresidencymap.Inthemap,thespace-timeoccupiedbyworkingsets,st(T),isthetotalnumberofpositionscontaining1.Figure8illustrates.

Figure8.Theaddresstracegivenearlierisshownacrossthetop,labelingthecolumns,andthepagesalongtheside,labelingtherows.Eachmapposition(i,t)ismarkedwith1ifthepageiisintheworkingset(T=4)atthattimet,or0otherwise.Thecolumnsrepresentthedifferentworkingsetsandthenumberof1sinacolumnistheworkingsetsize.Thecolumnatt=0indicatesthattheworkingsetisinitiallyempty(coldstart).(Warmstartwouldberepresentedbyacolumnof1s.)Notethatamissoccurswheneverthereisa1inarowimmediatelyprecededbya0.Notealsothatthisisaminiaturepagereferencemapwithsamplinginterval1.

Letcol(t)bethenumberof1’sincolumntofthematrix;noticecol(t)istheworkingsetsizew(t,T).

Letrow(i)bethenumberof1’sinrowi.

Arunisaseriesofconsecutive1sstartingwithamissandendingwitheithera0orendoftrace.Figure9illustrates.ThefinalrunmayterminateattimeNwithnomore0’s.

Page 23: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 23

Figure9.Arunisatimeintervalduringwhichaparticularpageiscontinuouslyintheworkingset.Inthisillustration,eachverticalmarkisareferencetotheparticularpageandthedistancesbetweenverticalmarksarereuseintervals.Arunbeginswithamiss,containszeroormorereuseintervals≤T,andendsinareuseinterval>T.Inthereuseinterval>T,thepageremainsintheworkingsetforT-1timeunitsbeyondthepreviousreference–theoverhang.Aruncanendwithasmalleroverhangafterthelastreferencetothepage;afinalintervaloflengthk≤Thousesanoverhangofk-1,notT-1.Eachnewmissstartsanewrun.

Little’sLawforWorkingSetsLittle’slawisveryusefulinqueueinganalysesbecauseitgivesarelation

betweenthreemeanvalues:themeannumberinasystemistheproductofthemeanholdingtimeinthesystemandthethroughputofthesystem.Foraworkingsetconsideredasanevolvingsystemcontainingpages,thenumberisthemeanworkingsetsize,theholdingtimeisthemeanlengthofarun,andthethroughputisthemissrate:

𝑠(𝑇) = 𝑅(𝑇)𝑚(𝑇)

Thislawiseasytoverifyfromourdefinitions.Becauseeveryrunbeginswithamiss,thetotalnumberofrunsismc(T).Themeanrunlengthis

𝑅(𝑇) =1

𝑚𝑐(𝑇)7𝑟𝑜𝑤(𝑖)'

#&!

Then:

𝑠(𝑇) = 1𝑁7𝑐𝑜𝑙(𝑡) =

1𝑁

$

%&!

7𝑟𝑜𝑤(𝑖) = 7𝑟𝑜𝑤(𝑖)𝑚𝑐(𝑇)

'

#&!

'

#&!

𝑚𝑐(𝑇)𝑁 = 𝑅(𝑇)𝑚(𝑇)

IntheexampleofFigure8,thetotalspace-timeis48andnumberofrunsis7;thuss(4)=48/15,m(4)=7/15,andR(4)=48/7.

Little’slawalsoquantifiestheamountofoverhang(seeFigure9).Foraverylongaddresstrace,everyrunterminateswithareuseinterval>TwithoverhangT-1.

Page 24: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 24

Takethe“system”tobeadelaylineofT-1timeunitsrepresentingtheoverhang.Thethroughputism(T).Theresponse(holding)timeofapageinoverhangisT-1.Thereforethemeannumberofpagesinoverhangistheproduct(T-1)m(T).(ThisargumentisnotexactbecausesomeoverhangsattheendofthetraceareshorterthanT-1;wewillshowshortlyhowtocorrectforthis.)ThemeanoverhangisusefultoassessthedistanceWSisfromoptimalbehavior:ifwecouldeliminatealloverhangs,theworkingsetwouldbeoptimal.Wewillprovethisshortly.

WorkingSetRecursionsInworkingsetanalytics,recursionsaresimpleformulasthatrelateameasure

atwindowsizeT-1tothemeasureatwindowsizeT.Wewillderiverecursionsformisscountandworkingsetspace-time.Theserecursionsenableustocalculatemissrateandworkingsetsizeiterativelyinlineartime–thatis,O(N)forNtheaddresstracelength.

ThealgorithmformissrateismuchfasterthanitscounterpartintheFixedMemoryView.ThebestalgorithmintheFixedMemoryViewsimulatesthestackinordertogetthestackdistances,requiringtimeO(MN).Thisdifferencecanbesignificant.Forexample,a32-bitaddressspace(232)composedof4096-bytepages(212)hasM=220(106)distinctpages;computingfixed-memoryfaultcountwouldthereforetakeamilliontimeslongerthantheworkingsetmisscount.

Inasinglepassofthetrace,wecanaccumulateahistogramofreusecounts,𝑐(𝑘) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑟𝑒𝑢𝑠𝑒𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑜𝑓𝑙𝑒𝑛𝑔𝑡ℎ𝑘

fork=1,…,N.Wewillalsoincludeoneadditionalcounterc(x)thatcountsthenumberoffirstreferences.Thesecounterscanallbeupdatedinlineartime.8

Bydefinition,thereisnoreuseintervalprecedingthefirstreferencetoapage.ThenextreferenceisamissifthereuseintervallengthisgreaterthanT.Inotherwords,forwarmstart,

𝑚𝑤(𝑇) = 7 𝑐(𝑘)()*

andthusforcoldstart𝑚𝑐(𝑇) = 𝑀 +𝑚𝑤(𝑇)

Themisscountsatisfiesthesimplerecursion

𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)withtheinitialconditionmc(0)=N.

8Thiscanbedonebymaintainingtimestamps,last(i),forthelastreferencetopagei.Atareferencetopageiattimet,thereuseintervalisk=t-last(i).Afterthelastreference,theendintervalisk=N+1-last(i).Adding1toc(k)foreachoftheseeventsleavescorrectedvaluesinthecounters.Thisisthesameprocedureasinthe1978paperwithSlutz[den78a].

Page 25: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 25

Nowweturntoarecursionfortheworkingsetsize.Todothis,weexploitaninclusionproperty:therunsforwindowT+1includethoseforwindowT.

WhathappenswhenweincreaseTtoT+1?Atfirstapproximation,everyrunisextendedbyjust1unit:allthe“reuse≤T”intervalsarealso“reuse≤T+1”intervalsandonlythefinal“reuse>T”hasroomforexpansion.Itsexpansionisoneunitofspace-time.Becausethetotalnumberofrunsismc(T),thetotalnumberof1’saddedtothereferencemapismc(T).Thus

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇)

Thisrecursionisexactforinfiniteaddresstraces[den68a,den72],butcontainsanerrorforfinitetraces[den78a].Theerrorsoccurintheendintervalsofpagesasfollows.ThereareMendintervals,oneforeachpage.Anendintervalisthetimefromthelastreferencetoapageuntiltheendofthetrace.Noticethatthisisthesameasifwepretendthereisanother,phantomreuseofpageiattimeN+1.Noticealsothatanendintervaloflength≤TwillnotexpandwhenTincreasestoT+1.Tocorrectforthis,weneedtodeductfrommc(T)thenumberofend-intervalsoflength≤T.Wecandefineanendfactor

𝑒(𝑇) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑒𝑛𝑑𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 ≤ 𝑇andthentheaccuraterecursionis

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)Wecanexpressthiswiththemeans

𝑠(𝑇 + 1) = 𝑠(𝑇) + 𝑚(𝑇) −𝑒(𝑇)𝑁

Becausee(T)≤M,theendfactorvanishesasNbecomeslarge.9Wecansimplifythisbylookingcloselyatthewaytheendintervalscontribute

toe(T).Thereisonlyoneendintervalforeachpagei.Definetheend-counterec(k)=1ifapagehasend-intervaloflengthkand0otherwise.Thenthesumofallthee(k)isM.Thelasttwotermsinthest-equationabovecanbereduced:

𝑚𝑐(𝑇) − 𝑒(𝑇) = 𝑀 +7𝑐(𝑘) − 7 𝑒𝑐(𝑘)!+(+*()*

= 𝑀 +7𝑐(𝑘) −()*

J𝑀 −7𝑒𝑐(𝑘)()*

K

= 7(𝑐(𝑘) + 𝑒𝑐(𝑘))()*

≜ 𝑚𝑤ℎ(𝑇)

9Itisinterestingtonotethate(T)=w(N+1,T),thenumberofpagesinaworkingsetmeasuredattheendofthetrace.ThereasonisthatapageisinthatworkingsetifandonlyifitsfinalreferenceiswithinthelastTwindowofthetrace.

Page 26: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 26

Inotherwords,thetermmc(T)-e(T)=mwh(T)iscomputedbyaddingtheend-correctioncountstothereusecountsc(k).Wecallthemisscountwithwarmstartandcorrectedendintervalsthe“warmstarthotfinish”misscount.Itisawarmstartbecauseinitialpagefaultsarenotcounted.Itis“hotfinish”becausetheendcorrectionsareappliedattheveryendbypretendingthatallMpagesaresimultaneouslyreferencedattimeN+1.Thiscanbesummarizedastheworkingsetsizerecursion:

WorkingSetSizeRecursion

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑤ℎ(𝑇)wherest(T)isthespace-timeaccumulatedbyworkingsetsofwindowTandmwh(T)isthewarmstarthotfinishmisscount.Theinitialconditionsarest(0)=0andmwh(0)=N.

Noticethatforverylongaddresstraces(largeN),theMendcorrectionsareinsignificantcomparedtothetotalofallthecounters.ForlargeN,mwh(T)convergestomc(T).

Therecursionwereported1972wasmathematicallythesame,butitdependedontheassumptionthatthereferencestringisarandom(stochastic)processthatentersalong-termsteadystate[den72].In1978weremovedthestochasticassumptionandfoundtherecursionworksforfiniteaddresstracesifwemakeendcorrectionstothecounters.Theworkingsetrecursionstatedherehasamuchsimplerderivation.

Theworkingsetrecursioncanalsoberun“backwards”todeducethemissrategiventhemeanworkingsetsize.Ifwehadadirectmeasurementofthemeanworkingsetsize,wecouldfindthemisscountsbytakingthedifferencesmc(T)=st(T+1)-st(T).[xia13]

Page 27: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 27

Forcompleteness,wesummarizethemissraterecursions:

SummaryofMissRateRecursionsMissratesetrecursionsenablethelinear-timecalculationofmisscountsatwindowsizeT+1intermsofthoseforwindowsizeT.Formisscounts:

𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)Wherec(T)isthecountofreusedintervalsoflengthT,withinitialconditionsc(0)=0andmc(0)=N.Forwarm-start-hot-finish

𝑚𝑤ℎ(𝑇 + 1) = 𝑚𝑤ℎ(𝑇) − 𝑐𝑐(𝑇)

wherecc(T)=c(T)+ec(T)isthecountcorrectedforendintervals.Theinitialconditionsarecc(0)=0andmwh(0)=N.

ThemisscountmeasuresmwhandmcbothcontainNcounts;mcincludesMfirstreferencesbutnoendcorrections,andmwhincludesnofirstreferencesbuthasMendcorrections.AsNgetslarge,bothratesmc(T)/Nandmwh(T)/Nconvergetom(T).

ExampleFigure10showsthesequenceofreuseintervalsandendintervalsforthe

exampleaddresstrace.Wetallythesedataalongwiththemisscountsandworkingsetspace-timeinthetablebelow.Therowfork=0givestheinitialconditions.

Figure10.ThetoprowistheaddresstracefromFigure7.Thereuseintervalsareshowninthemiddlerow,whereeach“x”marksafirstreference.Theendintervalsareshowninthebottomrow,whereeach“x”marksanon-finalreference.

Page 28: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 28

Table2

k c(k) ec(k) mc(k) mw(k) mwh(k) st(k)

0 0 0 15 10 15 0

1 1 1 14 9 13 15

2 1 1 13 8 11 28

3 1 1 12 7 9 39

4 5 1 7 2 3 48

5 1 0 6 1 2 51

6 0 1 6 1 1 53

7 0 0 6 1 1 54

8 0 0 6 1 1 55

9 0 0 6 1 1 56

10 0 0 6 1 1 57

11 1 0 5 0 0 58

12 0 0 5 0 0 58

13 0 0 5 0 0 58

14 0 0 5 0 0 58

x 5

ThesedataareplottedinFigure11andcomparedwithLRUandMIN.ThegraphshowsaregionatthesmallermemoryallocationswhereLRUisbetterthanWS.Asimilarpatternissometimesobservedinactualprograms[wan15,wir14].

Page 29: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 29

Figure11.Thisgraphcomparesthreefaultcurves.Afaultcurveplotsthenumberoffaults(verticalaxis)versusmemorydemand(horizontalaxis).ThesolidcurveistheWSexampleofFigure8.LRUandMINarealsoshown.Forafixed-spacepolicy,thememorydemandinspace-timeistheproductofthememorysizeandaddresstracelength;heretheallowablememorysizes1,2,3,4,5forLRUandMINcorrespondtospace-times15,30,45,60,75.Incontrast,WSisvariablespaceandisdefinedatpointscorrespondingtonon-integeraveragememorysizes.Forthisaddresstrace,WSismostlybetterthanLRUandmostlyworsethanMIN,althoughatthelargestwindowsize,WSisslightlybetterthanMIN.Thisisnotananomaly,butrathertheconsequenceofworkingsetsbeingvariableinsize.

Page 30: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 30

OptimalMemoryPolicyNowwewillexaminetheoptimalpoliciesformanagingmemory.Theoptimal

policyforFixedMemoryisMIN.ItwasdefinedbyLesBeladyin1966[bel66].MIN’sprinciple,invokedatapagefault,is“replacethepagethatwillnotbereusedforthelongesttimeinthefuture.”Thispolicyisunrealizablebecausetheoperatingsystemcannotknowthefuture.However,itsfaultcountcanbecomputedinasinglepassofanaddresstracewithaboutthesameoverheadasforLRU:orderO(NM)[matt70].

TheoptimalpolicyforSharedMemory,inwhichthespaceallocatedtoajobcanvary,isVMIN.ItwasdefinedbyBartonPrieveandRobertFabryin1976[pre76].VMIN’sprinciple,invokedaftereachreference,is“retainthecurrentreferenceinmemoryifitsforwardreuseintervalis≤T,otherwisedeleteitimmediately.”Thispolicyisalsounrealizable.However,itsfaultfunctionandmeansizecanbecalculatedrapidlyasforWS:orderO(N).

Itisinterestingthatdatabasedesignersdiscoveredthesameoptimizingprincipleinthe1970s[gra85].Hereistheirargument.Supposewewanttodecidewhentokeepapageofdatainmainmemoryversusthemuchslowerharddisk.Justafterthepageisused,welookintothefuturetoseeexactlywhenitwillbeusedagain.Thenwedoasimplecalculationtocomparetherentalcostofkeepingitinmemoryuntilnextusewiththecostofremovingitimmediatelyfrommemoryandpayingtheswappingcostforitsretrievallater.Thedatabasedesignersfoundthatwithtypicalparametersformemorycostanddiskdelay,adatapageshouldbedeletedifithasnotbeenreusedafterabout5minutes.

VMINusesthewindowofsizeTasthethresholdpointatwhichretaininguntilnextreusecoststhesameasapagefault.Tostatethisprecisely,supposetimet+xisthenextreuseofthepagereferencedattimet;VMINdecideswhethertoretainthatpageornotasfollows:

Ifx>Tthenimmediatelyevictthepagefrommemory;

Ifx≤Tthenkeepthepagecontinuouslyinmemoryuntilnextreuse.VMINexercisesthischoiceateverytimet.Becauseeachandeveryreferenceisevaluatedforminimumcost,thetotalcostmustbeminimumtoo.

VMINisjustlikeWS,butwithaforwardlookingratherthanbackwardlookingwindow[den80].Ifthepageisusedagainintheforwardwindow,VMINretainsitinmemoryuntilnextuse.Ifthepageisnotusedagainintheforwardwindow,VMINimmediatelyremovesitfrommemory.VMIN’spagefaultsequenceisidenticaltoWS.

Whenthereuseintervalxis≤T,WSandVMINretainthepagecontinuouslybetweenthetwosuccessivereferences.Ifx>T,WSretainsthepageforadditionaltimeuntilevictingit,whereasVMevictsitimmediately.Thismeansthatthespace-timedifferencebetweenWSandVMINisdueentirelytotheoverhangsinthereuseintervalslongerthanT.ThisobservationallowsustocalculatetheVMINspace-timevt(T)directlyfromreusecounters.Startwithvt(1)=Nbecausewithwindowsize1

Page 31: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 31

onlythecurrentreferencesareintheVMINmemory.Areuseintervaloflengthk≤Tcontributesanadditionalk-1tothespace-time.Thus,

𝑣𝑡(𝑇) = 𝑁 +7(𝑘 − 1)𝑐(𝑘)*

(&!

Wegetarecursionstraightway,

𝑣𝑡(𝑇 + 1) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) + 𝑇𝑐(𝑇 + 1)*

(&!

*,!

(&!

orsimply,𝑣𝑡(𝑇 + 1) = 𝑣𝑡(𝑇) + 𝑇𝑐(𝑇 + 1)

Therefore,aswithWS,theVMINspace-timeandmissratecanbecalculateddirectlyfromthereuseintervalcountersc(k),withoutasimulationofVMIN.10

WecancalculatethedifferenceofWSandVMINspace-timesimplyasthetotaloverhanginreuseintervals>T.Specifically,theoverhangisT-1inanyreuseorendinterval>T;byourpreviouscalculationstherearemwh(T)ofthese.Forallreuseintervalsoflengthk≤T,thereisnooverhang,butendintervalsoflengthk≤Thaveanoverhangofk-1.Thisleadstotherelation

𝑠𝑡(𝑇) = 𝑣𝑡(𝑇) + (𝑇 − 1)𝑚𝑤ℎ(𝑇) +7(𝑘 − 1)𝑒𝑐(𝑘)*

(&!

Figure12illustratesthisforthedataoftheprevioustable.

Thelessonfromallthismathisthatwecanquicklycomputethespace-timeofaVMINpolicyfromthesamereuseintervalstatisticsasforWS.

10Bycomparison,theWSspace-timerecursionhasamiss-ratetermmwh(T),thesumofcounters,insteadofasinglecounterc(T+1).ThatisbecausewhenweincreaseTtoT+1,theWSoverhanggrowsinallthereuseintervals>T,thetotalnumberofwhichisthesumofallthecountersc(k)fork>T.VMINisparsimonious:increasingTtoT+1onlyaddsthespace-timeofreuseintervalsofexactlylengthT+1.

Page 32: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 32

Figure12.ThisgraphplotstheVMINspace-timeandtheWSspace-time.Ifwepickanyvalueofmisscount,wecancomparetheVMINandWSmemorydemands.Forexample,atmisscount8,VMINspace-timeis30page-unitsandWSspace-timeisabout47;theVMINmemorydemandisabout1/3smallerthanWSforthesamepagingrate.NoticethattheVMINcurveisalwaysconvex,whereasWSisnot.Becausetheexampleaddresstraceistooshorttoexhibitlocality,WSandVMINarenotclose.

GoodLocalityMakesWorkingSetNearOptimalWewouldliketoclarifyapointabouttheassumptionsbehindthemathematical

formulasgivenabove.Theonlyassumptionisthatanaddresstracecanberecordedfromacomputation.Therecursionformulasaresimplymathematicalrelationships

Page 33: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 33

formeasurescomputedonaddresstraces.Theworkingsetandfaultmeasuressimplycounteventsintheaddresstrace.Optimalitydefinesthebestthatcanbedoneonagivenaddresstrace.Therearenoassumptionsaboutlocalitybuiltintotheworkingsetandoptimalitymeasures.Workingsetsareunbiasedmeanstomeasurewhetherlocalityispresentornot.

Thepagereferencemapisconstructedfromanaddresstrace.Localityappearsinthedistinctivephase-transitionpatternsofthemap.Theworkingsetwasdevisedtomeasurethelocalitysetsseeninthemap.Itissimplyameasurementtool.Itmakesnoassumptionsaboutlocality.

LocalitycomesinwhenwewanttoclaimthatWSisclosetotheoptimalVMIN.Let’sseehowthatworkswiththereferencemapinFigure1.

ConsideralocalitysetofPpagesanditsassociatedphaseconsistingofLsamplewindowsoflengthT.EachsamplewindowwithinthephasecontributesPTspace-time.VMINandWShavetheidenticalmemorycontentseverywhereinthephaseexceptforthefirstandlastsamplingwindows.Inthefirstwindowofthephase,theybothacquirethePpagesofthenewlocalityatthesamemomentsofpagefaults;butWShassomeadditionalspace-timebecauseitretainspagesfromthepreviousphase.LetfandgbethefractionofPTusedrespectivelybyWSandVMINinthefirstsamplewindowandhbethefractionofPTusedbyVMINinthelastwindow.Thenthespace-timeforWSacrossthewholephaseisfPT+(L-1)PT.ForVMINitisgPT+(L-2)PT+hPT.TheratioofWStoVMINis

𝑠𝑡(𝑇)𝑣𝑡(𝑇) =

𝑓𝑃𝑇 + (𝐿 − 1)𝑃𝑇𝑔𝑃𝑇 + (𝐿 − 2)𝑃𝑇 + ℎ𝑃𝑇 <

𝐿𝐿 − 2

Theinequalityresultsfromreplacingfwithitsupperbound(1)inthenumeratorandgandhwiththeirlowerbounds(0)inthedenominator.

Figure1shows5localitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Kunits.AmeasurementofthegraphtofindthelocalitysetsizesanddurationsyieldstheTable3,wheresizeisthenumberofpagesinthelocalitysetandlengthisthenumberofsampleintervalsinaphase.

Table3

set size length L/(L-2)

1 50 180 1.01

2 65 220 1.01

3 30 220 1.01

4 55 50 1.04

5 35 180 1.01

Page 34: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 34

ThefinalcolumninthetableaboveshowstheboundingratioforeachofthephasesofFigure1.Forthelongerphases,VMINandWSareabout1%apart.Fortheshortphase,theyareupto4%apart.Amorepreciseanalysiswouldhavenarrowerseparationsthanthisapproximateanalysis.

VMIN’svirtualspacetimeislessthanWS’s.Howdoesthistranslatetorealspace-timeasneededbythememoryspace-timelaw?Inournotationthelawbecomesst(T)(1+Dmc(T)).SinceWSandVMINhaveidenticalpaging,theexpressionforVMINrealspace-timeisvt(T)(1+Dmc(T)).ThusVMIN’sminimumvirtualspacetimetranslatesdirectlytominimumrealtimespace-timeandmaximumthroughput.ThisiswhywhenWSisclosetoVMIN,theirrespectivesystemthroughputsareclose.

Insummary,workingsetanalyticsprovidestoolsformeasuringlocality.Whenlocalityispresent,aworkingsetmemorymanagementpolicywillsetsystemthroughputtowithinafewpercentofoptimal.

ComparisonsMemorypoliciescanbeclassifiedintwodimensions:fixedorvariablespace,

optimalornot.Wehavechosenrepresentativesofeachcombination:

Table4

Fixed space Variable space

Not optimal LRU WS

Optimal MIN VMIN

Theoptimalpoliciesaregenerallynotrealizableinrealtimebecausethey

requireknowledgeofthefuturereferencepatterns.Althoughtheoptimalpoliciesarenotrealizable,theirperformanceiseasytocomputefromarecordedaddresstrace.Therefore,itiseasytomeasurehowfararealizablepolicy(LRUorWS)isfromoptimal.

Fixedspacepoliciesareusedformemoriesoffixedsizesuchascachesorhardpartitionsofmemoryinanoperatingsystem.Fixedspacepoliciesaresubjecttothrashingwhenextendedtosharedmemoryorcache,becausethespaceallocatedtoeveryjobdecreasesasthenumberofjobs.TheWSpolicy(Figure6)isresistanttothrashingbecauseitcannotstealpagesfromotherprocessesandbecausetheschedulerwillnotloadnewprocessesiftheirworkingsetsdonotfitintotheavailablefreespaceofmemory.

Page 35: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 35

Whathappenswhenthesepoliciesareappliedforjobsdisplayingastrongdegreeoflocality?Whenthememoryallocatedbythepolicyistoosmalltoholdthejob’slocalitysets,LRUandWSwouldbecomparable–butwithpoorperformance.Whenmemoryissufficienttoholdlocalitysets,LRUandMINwouldbecomparable,andWSbetterthanbothbecausethefixed-spacepoliciesretainpagesnotbeingusedinalocalityset.

Moreover,LRUisalsosusceptibletothrashingwhenextendedtosharedmemory.WSwillnotthrashanditisclosetooptimal.

ChoosingtheWindowSizeWhatisagoodwindowsize?Isthereabestwindowsizeforeachprocess?

Whatwouldhappenifallprocessesweremeasuredbythesamewindowsize?

Thisquestionhasbeeninvestigatedbyresearchersdatingbacktothe1970s.OnebasicfindingisthatgraphsofWSspace-timeversusTshowthattheminimumofst(T)isinawide,near-flatplateauinmostjobs.ThereisusuallyasinglevalueofTthatintersectsalltheplateausofthejobs.Inotherwords,itdoesnotmakemuchdifferencewhatvalueofTyouchoose;Tcanbeasingle,globalvaluechosenoverawiderangewithoutsignificantlychangingthesystemthroughput.

Fromthepagereferencemaps,weseethatanidealTisjustlargeenoughtoseeallthelocalitypagesinthewindowthroughoutthephase.InFigure1,forexample,T=380Kissufficienttoseeallthepagesofthelocalityset.ThatvalueofTisasmallfractionofthephaselength–lessthan0.5%forthefourlongphasesand2%fortheoneshortphase.

Thisgivesapracticalanswertotheperformancetuningproblemmentionedatthebeginning.AsystemadministratorcanadjusttheparameterTinaWS-managedsystemtofindavaluethatmaximizessystemthroughput.Afterthat,theparameterTdoesnotneedtobechanged.ThatmaximumwillbeclosetothetheoreticaloptimumofVMIN.

ImplementationsThereareseveralwaystoimplementtheworkingsetcheaply.Theoriginalideasforimplementation(1968)wereoftwokinds[den68a].The

firstwastohavetheoperatingsystemscanandresettheusebitsinpagetableseveryTtimeunitsofvirtualtime,removinganyunusedpagesfromworkingsets.ThisimplementationwasnotattractivebecausescanningallthepagestableseveryTtimeunitsinalargesystemwouldbeveryexpensive.

ThesecondideawastoassociateahardwaretimerofdurationTwitheachpageframeofmemory.ThetimerissettoTateachaccesstothatframe.Itticksdownto0afterTtimeunitsofnon-use.Theoperatingsystemcandetecttheseunused

Page 36: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 36

framesfromtheirexpiredtimersandremovethemfromtheirworkingsets.Thisimplementationwasnotattractiveatthetimebecausetherequiredhardwarewouldbetooexpensiveand,moreover,becauseofCPUmultiplexing,thetimerswouldneedtobeshutoffwhenthejobowningthepagewasnotrunning.

ChenDingandhisstudentsatUniversityofRochesternotethatinmoderncaches,multipleCPUsexecuteinparallelandshareon-chipL1andL2cache.Thehardwareiseasilydesignedtoincludetimersoncachepages.Becausetherearemultiplecores,thereisnoneedtoshutanytimersofftoaccommodateround-robinCPUscheduling.Theirproposalofa“leasecache”,inwhicheachcachepagehasitsowntimer,isnowfeasible[li19].AleasecachewithleaseTisexactlytheworkingsetpolicywithwindowT.

AsimplemodificationofthepopularCLOCKalgorithmenablestheoperatingsystemtodetecttheworkingsetpagesofajob.Asbefore,theprocess’spagesarearrangedinacircularlist,butnoweachpagehasatimestampinsteadofausebit.Thetimestamprecordsthetimeofthelastaccesstothepage.ThescanningclockhandskipsoverallpageswhosetimestampiswithinTofthecurrenttime,andstopsatthefirstpagewithanoldtimestamp.Thismethod,calledWSCLOCK,wasinventedandvalidatedbyRichardCarr[car81].

NotethatWSCLOCKhasnobuilt-inloadcontrol.Byitself,itcannotimplementthefullWSmultiprogrammingpolicy(Figure6),whichdoesnotloadanotherjobuntilthereissufficientfreespaceforitsworkingset.However,theWSCLOCKsearchtimeforapagewithanoldtimestampcanbeaproxyforthemeasureoffreespace:thelongerthesearchtime,thelessfreespaceisavailable.Thememoryallocatorcanloadanewjobwhenthesearchtimeisbelowasetthreshold.(NotethatthememoryallocatorisnotthesameastheCPUcorescheduler.Thememoryallocatordecideswhentoloadajob’sworkingsetintomemory.TheschedulerassignsfreeCPUcorestojobswhoseworkingsetsareloaded.)

ApplicationinModernCachesModerncomputerchipsrelyoncomplexhierarchiesofcachestoachieveand

maintainhighperformance.TypicalcachelevelsL1,L2,andL3bufferthegapbetweentheRAMandtheCPU.LevelL1isclosesttotheCPUandisthesmallestandfastestofthecaches.Therearetwokindsofcacheconfigurations.Inclusivecachemeansthatcachepages(alsocalledslotsorblocks)ofalevelaresubsetsoflargerpagesatthenextleveldown.Exclusivecachedoesnotrequirethissubsetting[ye17].ThesecachesusuallyusesomeformofLRUreplacementtodeterminewhenacachepageispusheddowntothenextlowerlevelcache.Thehighest-levelcache(L1andL2)isdedicatedtoindividualCPUcores,whileL3cacheissharedamongallthecoressimilartomultiprogramming[ye17].ThesharedL3cachemaybepartitionedunequally,withsomecoresgettingmorethanothersdependingontheirlocality[bro15].ResearchersareinvestigatingwhetheravariablepartitionbasedonCPUcoreworkingsetswouldbemoreefficient.

Page 37: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 37

ChenDing’sresearchgrouplooksdeeplyintopolicyandperformancequestionsforsharedcaches[lia19,xia11,xia13,xia18].Manyoftheiranalyticmethodsforcacheswereinspiredbytheworkingsettheory.Theydefinedtwomeasuresofloadonthecache,calledfootprintandfootmark.I’lldiscussbothofthemandcomparewithWS.

Thefootprintmeasurewasmotivatedbyadesireforaccuracy.ForthefirstT-1referencesofatrace,theworkingsetsarethecontentsofatruncatedwindowbecausetherearenopagereferencespriortotimet=1.Infact,theinitialworkingsetsforthefirstT-1referencesareofsizesw(t,t).Thosetruncatedwindowspresentapotentialunderestimateinthecalculationsofmeanworkingsetsize.ChenDing’sfootprintavoidsthesecomplicationsbyaveragingtogetheralltheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:

𝑓𝑝(𝑇) = 1

𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$

%&*

Dingarguedthatthederivativeofthefootprintfunctionisthemissrate,justasindicatedbytheworkingsetrecursion.Thus,themissratecanbecomputedoncethefootprintiscomputed.Histeamfoundarecursionforcomputingfp(T)fromfp(T-1)andshowedthatthemeanfootprintiswithinonepageofthemeanworkingsetifthewindowisnottoolarge:

𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁Thiswillbeeasilytrueforprogramswithgoodlocality.DetailsareintheAppendix.

ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.HesearchedforarecursionbasedonfootprintthatwasidenticaltotheonereportedbyDenningandSchwartzforlongaddresstracesinstochasticsteadystate.Heandhisstudentsfoundanew,closelyrelatedmeasuretheycalledfootmark[yu18].FootmarksatisfiestheDenning-Schwartzrecursion

𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)

withinitialconditionsfm(0)=0andm(0)=1.ThedetailsofitsderivationareintheAppendix.

Wenowhavetworecursions–workingsetsizeandfootmark–thatsatisfythesamemathematicalform,

𝐹(𝑇 + 1) = 𝐹(𝑇) + 𝑀(𝑇)

whereFisaspacemeasureandMamissesmeasure.TheworkingsetusesforMthewarm-start-hot-finishmissratemwh(T)andfootmarktheactualworkingsetmissratemc(T).Forfiniteaddresstraces,thetwomissratesdifferbyafewendcorrections.Butforverylongtraces(largeN),theybecomeidenticalandthetworecursionsarethesame.Thatreplicatesthe1972findingforthesteady-statevaluesofworkingsetsizes(T)[den72].Likefootprint,thefootmarkmeasureiswithinonepageofthemeanworkingsetsizewhenT≤√2𝑁.Figure13illustratesthesemeasuresfortheexampleaddresstraceusedpreviously.WhenthereisariskthatT

Page 38: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 38

>√2𝑁,thebeststrategyistousetheworkingsetrecursion,whichisfullyaccurateandiseasilycomputedfrommwh(T),whichinturndiffersfrommc(T)byafewendcorrections.

Figure13.Thisgraphcomparesthefootprint(FP),meanworkingset(WS),andfootmark(FM)measuresforthe5-pageexampleaddresstrace.Theworkingsetrisesasymptoticallytoward4.0pages.Boththefootprintandfootmarkcontinuegrowing.WhenT≤√𝟐𝑵=5(verticalline)footprintandfootmarkarewithin1pageofWSaspredictedbythebound.Forexample,inthepagereferencemapofFigure1,thereareover1000sampleintervals;Tcouldbeaslargeas45(= √𝟐𝟎𝟎𝟎)sampleintervalswithoutcausingsignificanterrorbetweenfootmarkandmeanworkingset.WhenNisverylarge(notshowninthisdiagram),thetwomeasuresFMandWSconvergetothesame.

Page 39: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 39

WorkingSetsinParallelEnvironmentsThecomputingenvironmentsinwhichvirtualmemorywasformulatedfeatured

asingleCPUaccessingamemorysharedbymanyjobs.Itwasbasicallyaserialenvironment.TheCPUofoldhasevolvedintomodernmulticorechips,graphicsprocessors(GPUs),andseveralkindsofcache.Doesallthisparallelisminvalidateanyassumptionsoftheanalytics?Dographicsandneuralnetworks,theprimaryapplicationsofGPUs,exhibitlocality?

Asforthefirstquestion,theassumptionsofworkingsetanalyticsdoindeedapplyinaparallelenvironment.Theanalyticsdependonlyontheassumptionthatanaddresstracecanbemeasured.Aparticularrunofaparallelsystemyieldsanaddresstraceforthatrun.Thenextrunofthesamesystemmayyieldadifferentaddresstracebecausetheorderofeventsisdifferentortheinputdataaredifferent.Butthisdoesnotcreateanyproblemsfortheanalytics,becausetheanalyticsaredefinedforagivenaddresstrace.WSwillrevealthelocalitysetsofthattraceandadapttothem.VMINwillrevealthebestpossibleperformanceforthattrace.Whathappensinothertracesdoesnotaffecttheanalytics.11

Asforthesecondquestion,considerthatasystemisasetofCPUsaccessingasetofpagesinmemory.WheneveranyCPU,runninganyjob,accessesapage,thepage’sleaseissettoTandcountsdowntowardzerowitheachclocktick.Theprotocolfordecidingwhethertokeepapageinmemoryisthesamewhetherornotjobssharepagesorruninparallel.

Whethertheleaseruleoptimizessystemthroughputdependsonthemixbetweenpartsofjobswithgoodlocalityandpartswithout.Thereislotsofdebatearoundthisissue.AGPUworkingwithstreamingdata(forexample,runningagraphicsdisplay)orwithsimulationofaneuralnetwork(doingthelinearalgebracalculationsforeachlayer)maynotexhibitphase-transitionbehavior.Buttherestofthecomputation,suchasfeedbackforreinforcementlearninginaneuralnetworkandtheuserinterface,islikelytoexhibitgoodlocalitybehavior.Thismeansthatlocalityisimportantforoptimizingtheperformanceofthenon-GPUcomponentsandmanagingcommunicationbetweenjob-componentsrunningonconventionalCPUsandthoserunningonGPUs.Otheroptimizations,suchasstreamingcaches,canbeappliedtotheGPUparts.

WiderApplicationofWorkingSetPrinciplesTheprinciplesofmemorymanagementintheworkingsetmodelhavebeenused

outsidethetraditionaloperatingsystemsthathatchedthem.Thetwomost

11InhisbookRethinkingRandomness,JeffBuzendiscusseshowmanyaspectsofcomputersystemscanberepresentedaseventtraces[buz15].Thetraditionalalgorithmsofqueueingtheoryforutilization,throughput,andresponsetimeworkforthesetracesbasedpurelyonwhatcanbeobservedinthetrace.Theaddresstraceandworkingsetanalyticsareofthiskind.

Page 40: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 40

prominentarecontentdeliveryintheInternetandmanagementofinventoriesandlogisticsnetworks.

Startwithcontentdelivery.TheInternethostsmajorservicesthatdistributedatathroughouttheworld.Examplesaredistributionofmusic,video,books,andcloudservices.Eventhoughtheseserviceslooklikeasingleentity,theyarebuiltasdistributednetworksbecausecongestionatasingleserverwouldbetoogreatforacceptableservice.Thenetworksincludecacheslocatedinbusyzones,suchasmajorcities,sothatusersinthoseareascanaccessthecontentviaphysicallyshorthighbandwidthpaths.Contentisautomaticallydownloadedtoalocalcachewhenauserofrequestsnewdata.Datathathavebeenresidentforalongtimeareautomaticallyrefreshedatintervals.

Thesecachesareoftencalled“edgecaches”becausetheyrepresentmovingdataawayfromcentralizedlocationsthatareeasilycongested,totheedgeofthenetworkwheretheusersconnect.Numerouscompaniesemployedgecachesaspartoftheircontentdeliveryservices.

Anedgecachefunctionsinthesamewayasacacheinsidetheoperatingsystem.Whenablockofdataisaccessedforthefirsttime,itisloadedintotheedgecachefromacentralserver.AnLRUreplacementpolicyoraleasepolicyisusedtoremoveolddatawhenthecacheisfull.Thesecachesaccumulateworkingsetsandachievehighperformancebykeepingtheworkingsetsphysicallyclosetotheirusers.Allcachesperformwellbyexploitingtheprincipleoflocalityinthepatternsofhowusersaccessdata.

Turnnowtoinventorymanagement.Aninventoryisasetofitemsorparts.Alogisticsnetworkisanetworkofdepotsthatholdinventoriesofpartsusedbycompaniesormilitaries.Whenappliedtoinventories,aworkingsetisasetofpartsthathavebeenusedrecently.Depotmanagersaimtokeeplocaldepotsfilledwiththemostneededworkingpartssothattheycansupplytheirlocalusersrapidly.Whensomeoneasksforapartnotinthelocaldepot,themanagermustsendarequesttoanotherdepottoobtainthepart.

Giventheprevalenceoflocalitybehavior,itisreasonabletoexpectuseraccesspatternsforinventoriestoexhibitlocalitysets(subsetsofpossibleparts)andphases(intervalsofheavyuseofaparticularlocalityset).Workingsetanalyticscouldbeusedtodeterminethecapacityneededofaninventorydepot.

Thereisadifferencebetweena“logisticsworkingset”anda“memoryworkingset”.Thelogisticsworkingsetincludesmultipleinstancesofapart,whereasthememoryworkingsethasjustoneinstanceofapage.Thenumberofrequestsforaparticularitemintheworkingsetwindowforecaststhequantityofthatitemtokeeponhand.Itwouldbeworthwhiletostudythisideaoflogisticsworkingsetsandtheirphasetransitionmapstoseewhatinventorymanagementstrategiesgiveoptimalperformanceofthelogisticsnetwork.

Page 41: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 41

MemoryMisconceptionsBeforeclosingthistutorial,Iwouldliketocommentonfourcommon

misconceptionsaboutcomputermemory.Theyresultfromunfoundedassumptionsthatleadtopessimisticanswerstofourbigquestions:

1. Ismemoryflattening?2. Isvirtualmemoryobsolete?3. IscomputermemoryirrelevantintheCloud?4. IsthePrincipleofLocalityobsolete?

IsMemoryFlattening?

Memoryisanessentialcomponentofcomputers.MostpeoplethinkofmemoryasacompaniontechnologythattheCPUinteractswithtofetchinstructionsandaccessdata.PerformanceofthecomputerthenseemstobegovernedbytheCPUspeed.

Butthisisnotso.MemoryisnotasinglecomponentasisaCPUchip.MemoryisasystemthatincludescachesintheCPU,RAM,disks,networkservices,andCloudstorage.WhereasaCPUcorerunsonlyonejobatatime,memoryissharedamongmanyjobs.Memorycontentionbirthsqueueingathigh-demandservers.Diskandnetworkbottlenecksatthesequeueswillkillperformanceofmemorysystems.Asdiscussedinthistutorial,wehavelearnedhowtoorganizetheallocationofmemoryanddatatransferssothatweavoidthesebottlenecks.

Athoughtexperimenthighlightsthisbasicrealityaboutmemory.Alotofpeoplebelievetheclaimthatwithinafewyearswewillknowhowtomakememoriesthatareessentiallyinfiniteandflat.Flatmeansthattheaccesstimeofanyiteminthememoryisapproximatelythesame.Flatmemorywouldnotneedlocalityoptimizationbecauseeverypagewouldhavethesameaccesstime.Butflatmemoryisnotathingofthefuture.Itisalreadyhere–theInternet.Ifyouissuea“ping”commandfromanywhereintheInternetforanyIPaddressintheInternet,youwillseethatthepacketroundtriptimesaremostly30-90milliseconds.Thatisnotperfectlyflatbutisclose.Doesthatmeanyourcomputerwillexperienceamaximumdelayof90millisecondswhenitreachesouttoawebserver?Ofcoursenot.Iftheserverisverypopular,manycomputersallovertheInternetwillbetryingtoaccessit.Theserverwillqueueuptherequestsbecauseitsdisksareabottleneck.Thegreaterthedemandforthepopularwebserver,thelongerthewaittimeinthequeue–andthelongertheresponsetimetogetawebpage.Inotherwords,flatmemorysystemsdonotguaranteegoodperformance.ThisiswhytheInternetcontainssomanyedgecaches–theyspreadtheloadandavoidthequeueing.

ThesameproblemappearsatsmallerscalewhenmanyjobsshareRAMtogetherandcontendforuseofthepagingdisks.Workingsetmemorymanagementpreventsdiskoverloadandprotectsthesystemfromthrashing.

Page 42: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 42

Thecontrolsystemsthatpreventexcessdelayinaccessingsharedmemoryresourcesareanimportantpartofthememorystory.Mostusersandprogrammersarenotawareofthosecontrolsystems.

IsVirtualmemoryObsolete?

Virtualmemorywasinventedtoovercomethehighprogrammingcostofsolvingthe“overlayproblem”--manuallyplanningpagetransfersthatoverwritepreviouslyloadedpages.Today’smemoriesareusuallylargeenoughtocontainanentireprogramanditsdata.Thus,itwouldseemthatvirtualmemoryisnotusefulanymoreforsolvingtheoverlayproblem.

Theoverlayproblemismuchlessimportantthaninearlyvirtualmemories.Virtualmemory’srealvaluecomesfromitsabilitytopartitionmemory.Itallocatesnonoverlappingsubsetsofpagestoeachjob’smemoryandpreventsanyjobfromaccessingdatainanotherjob’smemory.Virtualmemoryprovidesthebasistoencapsulateuntrustedsoftwaresothatitcannotdamageanythingoutsideitsallocatedmemoryspace.Virtualmemoryprovidesbasicaccesscontrolbydistinguishingread,write,andexecutepermissionsforindividualpages.Inotherwords,virtualmemoryimplementsthebasicguaranteesofanoperatingsystemfordataprotection.

Virtualmemorydoesmorethanpartitionmemoryandprovidebasicaccesscontrol.ItalsomanagesmultiprogrammedRAMtoavoidthrashingandtomaximizesystemthroughput.ThesameconcernsforstabilityandthroughputnowoccurinthecacheontheCPUchip,whichissharedamongthemultipleCPUcoresonthechip.Theprinciplesofvirtualmemoryareimportantincachedesign.

IsComputermemoryirrelevantintheCloud?

TheCloudprovidesverylargestorageinthenetworkoutsideyourcomputer.Itdoesnotincludethestorageinsideyourcomputer.YourcomputerstillneedsL1,L2,L3cache,localRAM,andlocalstorage.Theoperatingsystemonyourcomputerneedstomanagetheselocalmemoryresources.TheCloudenlargesthestorageaccessibletoyourcomputerbutdoesnotreplacestoragemanagementwithinyourcomputer.

TheCloudisacomplex,distributedstoragesystem.Itconsistsofmanydatacentersaroundtheworld.Eachdatacenterincludestensofthousandsofcomputersanddisks.Sophisticatedredundancycontrolsmanagemultiplecopiesoffilesdistributedacrossmultipledatacenters,providinghighreliabilityandeaseofrecoveryfromfailuresofcomputersanddisks.TheoperatingsystemsrunningtheCloudmustalsomanagememory,avoidthrashing,andmaximizeCloudthroughput.

TheCloudisnotthefinalanswertostoragemanagement.OneoftheprimarylimitationsonperformanceofcomputersistheinterfacesbetweenCPUsandmemorysystem.MovingdataacrossthoseinterfacessignificantlyslowsCPU

Page 43: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 43

speeds.TheCloudisanotherinterface.ThoseinterfacedelaysareoftencalledvonNeumannbottlenecksbecausetheseparationofCPUandmemorywasacentralprincipleofthestored-programcomputerarchitecturethatbecameubiquitousafter1945.ManyresearchprojectstodayareexaminingnewarchitecturesthanhavenovonNeumannbottleneck.Theideaistobuildthecomputersothatcomputationsareperformedbyhugenumbersofprocessingelementsthatrequireonlylocalaccesstolimitedstorage.Theseareoftencalled“processing-in-memoryarchitectures”.Neuralnetworksareaprominentexample.

IsThePrincipleofLocalityobsolete?

TheheartofallthemethodstocontrolmemoryandoptimizeitsperformanceisthePrincipleofLocality.Eachprogramgeneratesauniquefootprintoflocalitysetsandphases.Highperformancecaches,RAM-DISKinterfaces,andnetworksallowetheirsuccesstothisprinciple.

Thelocalityprinciplerunsdeepincomputing.Algorithmtheoristshaveshownthataprocedurecannotbeanalgorithmunlesseachofitsoperationscanapplyonlytoabounded,localsetofdata.Computingmachinesthemselvesarebuiltfrommanycomponentsandmodulesthatuseonlylocalinputs.

Yetthelocalityprincipleformemoryisoftenmisunderstood.Somepeoplebelieveitsimplymeansunequalfrequenciesofuseofeachpage.Othersbelieveitmeansaslowdriftamongasetoffavoredpages.Fewseeitaslongphrasesofnearconstantlocalitysetspunctuatedbysharptransitionstonewlocalitysets.Yetthephase-transitionbehavioriscommonandisthereasonwhyworkingsetisabletogivenearoptimalsystemthroughput.

Anothermisunderstandingisthebeliefthatincentiveshavechanged.Intheearlydays,programmersofearlyvirtualmemorysystemspurposelyinducedlocalitybehaviorinordertogetthebestperformancefromthepagingalgorithms.Weareinadifferentagenow:memoryisnowherenearasscarceandprogrammersfeelnopressuretoinduceworkingsetsintotheirprograms.Thus,itwouldseemthatthemotivationforlocalityisgone.Thismisunderstandingisrefutedbytwofacts.Oneistheexperimentalstudiesshowingthatlocalitybehaviorispresentinthesourcecodeofprograms–thephase-transitionbehaviorappearinginpagereferencemapsisanimageofsourcelocality.Localityistheconsequenceofourproblem-solvingstrategiessuchasiterativeloopsanddivide-and-conquer.Theotherrefutationisrecentinstrumentationsshowingpagereferencemapswithevenmorepronouncedphase-transitionbehaviorthanwesawintheearlydays.Theincreaseduseofmodularprogramsisthemostlikelyexplanation.(SeeMcMenamin’sstudyofLinuxprograms[mcm11].)

Page 44: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 44

ConclusionTheWorkingSetModelforprogrambehavior,firstarticulatedin1968,has

stoodthetestoftime.Itstimulatedtheresearchthatrevealedthatalmostallexecutingprocessesexhibitlocality,establishingthePrincipleofLocalityasoneofthefundamentalprinciplesofcomputing.Itledtoasimple,precisewaytomeasurealltheworkingsetstatisticsinrealtimebyrecordingthereuseintervalstatisticsofanaddresstrace.ItledtotheconclusionthatWSisnear-optimalforprocessesexhibitinglocality.Itprovidedanexplanationforthemysteryofthrashingandshowedhowtobuildacontrolsystemtoavoidthrashing.

Since1968,computerCPUshavebecomeprogressivelyfaster,wideningthecostgapofretrievingapagefromalowerlevelofmemory.Atthesametimememoryhasbecomefarcheapersothatmostroutineapplicationsarefullyloadedinmemoryanddonotnormallycausepagefaults.Somepeoplehaveaskedwhetherthetheoryappliestorealsystemstoday.

Yes,itdoes.Realsystemstodayhavememoriesconsistingofmultiplelevelsofcache,RAM,anddisk.ThecachesnearCPU(L1andL2)aresharedamongmultiplecores(CPUs).Mostofthecurrentcachemanagementstrategiesdonotvarythepartitionofthecacheamongthecores,butresearchershavebeendemonstratingthatvariablepartitioncachesaremuchmoreefficient.Unfortunately,variablepartitionLRUcachesaresusceptibletothrashing.TheWStheorycanbehelpfultodesigncachemanagementstrategies,suchasleasecache,thatdonotthrashandmaintaincachemissestoclosetotheoptimallevels.

Despitethetremendousadvancesinmemorytechnologyoverthepasthalfcentury,thebasicassumptionsbehindmemorymanagementhavenotchanged.Virtualmemoryremainsusefulbecauseofitsabilitytoconfinejobstotheirlimitedregionsofmemory,toencapsulateuntrustedsoftware,andtomanageloadtoavoidthrashing.Flatmemorydoesnoteliminatetheneedforvirtualmemory;itintroducesitsownproblemsduetoqueueingandcongestionasmanyjobsaccesssharedmemoryresources.TheCloudaugmentsbutdoesnotreplacememorymanagementonlocalcomputers.Moreover,theCloudexacerbatesthevonNeumannbottleneckbetweenCPUandmemory.Thelocalityprincipleisfarfromobsolete–itcontinuestounderpinhighperformancememorysystems.Theworkingsettheorycanbeextendedandcombinedwithnetworkcachingtheoryforpossibleapplicationinlogisticsnetworksandinventorymanagement.

Workingsetsandvirtualmemorywillbepartsofcomputingforalongtimetocome.

Bibliography[aho71] Aho,A.,P.J.Denning,J.Ullman.1971.Principlesofoptimalpage

replacement.J.ACM18,1(January),80-93.

Page 45: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 45

[bel66] Belady,L.A.1966.Astudyofreplacementalgorithmsforvirtualstoragecomputers.IBMSystemsJ.5,2,78-101.

[bro15] Brock,J.,C.Ye,C.Ding,Y.Li,X.Wang,andY.Luo.2015.Optimalcachepartitionsharing.In44thIEEEInt’lConf.onParallelProcessing(September),749-758.

[buz76] Buzen,J.P.1976.Fundamentaloperationallawsofcomputersystemperformance.ActaInformatica7,2,167-182.

[buz15] Buzen,J.P.2015.RethinkingRandomness.Amazon.comPlatform.SeealsoACMUbiquityinterviewsoftheauthor,https://ubiquity.acm.org/article.cfm?id=2986329andhttps://ubiquity.acm.org/article.cfm?id=2986331

[car81] Carr,R.W.andJ.Hennessy.1981.WSCLOCK—asimpleandeffectivealgorithmforvirtualmemorymanagement.InProceedingsoftheeighthACMsymposiumonOperatingsystemsprinciples(SOSP‘81).

[den68a] Denning,P.J.1968.Theworkingsetmodelforprogrambehavior.CommunicationsofACM11,5(May),323-333.

[den68b] Denning,P.J.1968.Thrashing:itscausesandprevention.InProceedingsofthe1968,FallJointComputerConference(FJCC),partI(AFIPS‘68(Fall,partI)).

[den72] Denning,P.J.andS.C.Schwartz.1972.Propertiesoftheworkingsetmodel.CommunicationsofACM15,3(March),191-198.

[den78a] Denning,P.J.andD.L.Slutz.1978.Generalizedworkingsetsforsegmentreferencestrings.CommunicationsofACM21,9(September),750-759.

[den78b] Denning,P.J.,andJ.P.Buzen.1978.Theoperationalanalysisofqueueingnetworkmodels.ACMComputingSurveys10,3(September),225-261.

[den80] Denning,P.J.1980.Workingsetspastandpresent.IEEETrans.SoftwareEngineeringSE-6,1(January),64-84.

[den16] Denning,P.J.2016.Fiftyyearsofoperatingsystems.ACMCommunications59,3(March),30-32.

[fot61] Fotheringham,J.1961.DynamicstorageallocationintheAtlascomputer,includinganautomaticuseofabackingstore.CommunicationsofACM4,10(October1961),435-436.

[gra76] Graham,G.S.1976.Astudyofprogramandmemorypolicybehavior.PhDdissertation,PurdueUniversityDeptofComputerScience.

[gra85] Gray,Jim,andFrancoPutzolu.1985.The5minuterulefortradingmemoryfordiskaccesses.TandemCorporationTechnicalReport86.1.Availablehttps://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf

Page 46: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 46

[kah76] Kahn,K.C.1976.Programbehaviorandloaddependentsystemperformance.PhDdissertation,PurdueUniversityDeptofComputerScience.

[kil62] T.Kilburn,D.B.G.Edwards,M.J.Lanigan,F.H.Sumner.1962.One-levelstoragesystem.IRETransEC-11(April),223-235.

[li19] Li,P.,C.Pronovost,W.Wilson,B.Tait,J.Zhou,C.Ding,C.,andJ.Criswell.2019.BeatingOPTwithstatisticalclairvoyanceandvariablesizecaching.InProc.24thInt’lConf.onArchitecturalSupportforProgrammingLanguagesandOperatingSystems(April),243-256.

[lia19] LiangYuan,ChenDing,WesleySmith,PeterDenning,andYunquanZhang.2019.Arelationaltheoryoflocality.ACMTransactionsonArchitectureandCodeOptimization(TACO)16,3(August),1-26.

[mad76] Madison,A.W.andA.P.Batson.1976.Characteristicsofprogramlocalities.CommunicationsofACM19,5(May),285-294.

[mat70] Mattson,R.,J.Gecsei,D.R.Slutz,andI.L.Traiger.1970.Evaluationtechniquesforstoragehierarchies.IBMSystemsJournal9,78-117.

[mcm11] McMenamin,A.2011.ApplyingWorkingSetHeuristicstotheLinuxKernel.MastersThesis,BirkbeckCollege,UniversityofLondon.Availableathttp://cartesianproduct.files.wordpress.com/2011/12/main.pdf

[pri76] Prieve,B.G.andR.S.Fabry.1976.VMIN--anoptimalvariable-spacepagereplacementalgorithm.CommunicationsofACM19,5(May1976),295-297.

[ran68] Randell,B.andC.J.Kuehner.1968.Dynamicstorageallocationsystems.CommunicationsofACM11,5(May),297-306.

[say69] Sayre,D.1969.Isautomaticfoldingofprogramsefficientenoughtodisplacemanual?ACMCommunications13,12(December),656-660.

[spi72] Spirn,J.R.andP.J.Denning.1972.Experimentswithprogramlocality.ProcAFIPSConf41,SJCC.AFIPSPress.

[wan15] Wang,X.,etal.,"OptimalFootprintSymbiosisinSharedCache,"201515thIEEE/ACMInternationalSymposiumonCluster,CloudandGridComputing,Shenzhen,2015,pp.412-422

[wir14] Wires,J.etal.2014.Characterizingstorageworkloadswithcounterstacks.USENIXProc.11thConf.onOperatingSystemsDesignandImplementation,335-349.

[xia11] XiaoyaXiang,BinBao,ChenDing,YaoqingGao.2011.Linear-timeModelingofProgramWorkingSetinSharedCache.PACT2011:350-360

[xia13] XiaoyaXiang,ChenDing,HaoLuo,BinBao:2013.HOTL:ahigherordertheoryoflocality.ASPLOS2013,343-356.

Page 47: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 47

[xia18] XiamengHu,XiaolinWang,LanZhou,YingweiLuo,ZhenlinWang,ChenDing,andChenchengYe.2018.Fastmissratiocurvemodelingforstoragecache.ACMTransactionsonStorage14,2(April),Article12,34pp.

[ye17] Ye,C.,C.Ding,H.Luo,J.Brock,D.Chen,andH.Jin.2017.Cacheexclusivityandsharing:Theoryandoptimization.ACMTrans.onArchitectureandCodeOptimization(TACO)14,4,1-26.

[yua18] Yuan,L.,W.Smith,S.Fan,Z.Chen,C.Ding,andY.Zhang.2018.Footmark:Anewformulationforworkingsetstatistics.InInt’lWorkshoponLanguagesandCompilersforParallelComputing(October),61-69.Springer.

AcknowledgementsIamdeeplygratefultoJeffBuzen,ChenDing,ErolGelenbe,RolandIbbett,and

AdrianMcMenaminformanyconversationsthatsharpenedtheideaswrittenhere.AndIfondlyremembermyearlyteachersandcollaboratorsinthiswork,LesBelady,FernandoCorbato,JackDennis,RogerNeedham,BrianRandell,JerrySaltzer,StuartSchwartz,DonaldSlutz,andMauriceWilkes.

Page 48: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 48

APPENDIX:FOOTPRINTANDFOOTMARKChenDingattheUniversityofRochesterdefinedtwomeasuresofloadonthe

cache,calledfootprintandfootmark[lia19,xia11,xia13,xia18].Thefootmarkmeasurewasmotivatedbyadesireforaccuracy:forthefirstT-1

referencesofatrace,theworkingsetscontentsaretruncatedwindows,potentiallyunderestimatingmeanworkingsetsize.ThefootprintavoidsthisbyaveragingtogetheronlytheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:

𝑓𝑝(𝑇) = 1

𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$

%&*

Thefootprintandmeanworkingsetmeasuresdonotdiffersignificantlyunderpracticalconditions.Thedefinitionofmeanworkingsetsizeincludesthedefinitionoffootprint:

𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇) =

1𝑁

$

%&!

7𝑤(𝑡, 𝑡) +𝑁 − 𝑇 + 1

𝑁

*-!

%&!

𝑓𝑝(𝑇)

Sincewindowsizeisanupperboundforanyworkingsetsize,w(t,t)≤t,andthefirstsumhasanupperboundof(1+2+3+…+T-1)/N=T(T-1)/2N<T2/2N.Thesecondtermhasanupperboundoffp(T).Therefore,

𝑠(𝑇) < 𝑇"

2𝑁 + 𝑓𝑝(𝑇)

Thisreducesto

𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁

Inotherwords,themeanworkingsetsizeandfootprintarewithin1pageofeachotheraslongas𝑇 ≤ √2𝑁.Thiswillbeeasilytrueforprogramswithgoodlocality,asinFigure1.

Dingandhiscolleaguesalsodevelopedarecursionforfp(T)thatwouldenablecalculatingfootprintforallTinlineartime.Hereisaderivationofarecursion.StartbywritingthefootprintforwindowT+1:

𝑓𝑝(𝑇 + 1) = 1

𝑁 − 𝑇 7 𝑤(𝑡, 𝑇 + 1) = 1

𝑁 − 𝑇

$

%&*,!

J7𝑤(𝑡, 𝑇 + 1) − 𝑤(𝑇, 𝑇 + 1)$

%&*

K

Thetermsinvolvingw(t,T+1)canbereducedtotermsinvolvingw(t,T)usingtheworkingsetsizerecursiondevelopedearlier.Recallthat,whenTisincreasedtoT+1,alltherunsendinginareuseinterval>Tarelengthenedby1.Becauseeveryrunbeginswithapagemiss,andallthefirstreferencesintheinterval[1,…,T-1]startrunsthatextendinto[T,…,N],thenumberof1saddedtothepagereferencemapintheinterval[T,…,N]ismc(T).Inotherwords,thesamerecursionappliestotheextendedsum,withthesameendcorrectionsasbefore:

Page 49: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 49

7𝑤(𝑡, 𝑇 + 1) = 7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)$

%&*

$

%&*

Asaboveearlier,e(T)isthenumberoflastreferencesoccurringthelastTtimeunits.Definef(T)asthenumberoffirstreferencesinthefirstTtimeunits,andnoticethatf(T)=w(T,T+1).Thefootprintformulabecomes

𝑓𝑝(𝑇 + 1) = 1

𝑁 − 𝑇 J7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) − 𝑓(𝑇)$

%&*

K

Applyingthedefinitionoffp(T)andrecallingthatm(T)=mc(T)/N,thisbecomesthedesiredrecursion,

𝑓𝑝(𝑇 + 1) = 𝑁 − 𝑇 + 1𝑁 − 𝑇 𝑓𝑝(𝑇) +

𝑁𝑁 − 𝑇𝑚

(𝑇) −𝑒(𝑇) + 𝑓(𝑇)𝑁 − 𝑇

Thismessyexpressioncanbecomputedinlineartimebecausemc(T),e(T),andf(T)areallcomputableinlineartime.

ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.Heandhisstudentsdefinedanewmeasuretheycalledfootmark.FootmarkwouldbelikeWSbutwouldcontainadditionaltermsthatcompensatedfortheshortworkingsetwindowsduringtheinitialsegmentofthetrace.Theydividedanaddresstraceintothreeregions:

• TheinitialsegmentoflengthT-1.Inthisintervaltheworkingsetsizeshavetheformw(t,t),effectivelyawindowsmallerthanT.

• ThesecondsegmenthaslengthN-T+1.Inthissegmenttheworkingsetsizesareoftheformw(t,T).ThissegmentincludesallthewindowsoflengthTasinfootmark.

• ThethirdsegmentisthelastT-1referencesofthetrace.Inthissegmentaseriesofphantomworkingsetsofsizesw(N,k)aredefinedwithprogressivelyshorterwindowsklookingbackfromtheendofthetrace.Theyare“phantoms”becausetheyarenotactuallyobservedinarealcache.Theideaisw(N,k)canbepairedwithw(T-k,T-k)intheinitialsegment;everypairspansafullwindowlengthT.Afterthepairing,theshort-windowworkingsetsoftheinitialsegmentarereplacedbyfull-windowworkingsets.TheneteffectisthatNworkingsetswithwindowTdefinefootmark.

Theseideasproducedthefollowingdefinitionoffootmarkspace-timeFM(T):

𝐹𝑀(𝑇) = 7𝑤(𝑡, 𝑡) +7𝑤(𝑡, 𝑇) +7𝑤(𝑁, 𝑘)*-!

(&!

$

%&*

*-!

%&!

Page 50: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 50

ThepairingargumentsaysthatthefirstandthirdsumscanbereplacedwithasinglesumofT-1termsw(t,t)+w(N,t),sothateveryworkingsetinFM(T)haswindowT.Therefore,thefootmarkisfm(T)=FM(T)/N.

Becausethefirsttwosumsarethedefinitionofworkingsetspace-time,

𝐹𝑀(𝑇) = 𝑠𝑡(𝑇) +7𝑤(𝑁, 𝑘)*-!

(&!

Wecannowapplytheworkingsetrecursiontodefineafootmarkrecursion:

𝐹𝑀(𝑇 + 1) = 𝑠𝑡(𝑇 + 1) +7𝑤(𝑁, 𝑘)*

(&!

= 𝑠(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) +7𝑤(𝑁, 𝑘) + 𝑤(𝑁, 𝑇)*-!

(&!

wheree(T)isthenumberofendintervals≤T.Nowconsidertheworkingsetw(N,T).ApageisvisibleinthatworkingsetifandonlyifitsfinalreferenceoccursbeforetimeN-T+1.Thusthecontentsofthatworkingarepreciselythepageswhoseendintervalsare≤T,whichisthedefinitionofe(T).Thiscancelsthee(T)term.Thes(T)andsumtermscombineintothedefinitionofFM(T).Thus

𝐹𝑀(𝑇 + 1) = 𝐹𝑀(𝑇) + 𝑚𝑐(𝑇)WhenalltermsaredividedbyN,wegetthefootmarkrecursion:

𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)

wherefm(T)isthefootmarkforwindowsizeTandm(T)istheworkingsetmissrate.Theinitialconditionsarefm(0)=0andm(0)=1.