A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state...
Transcript of A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state...
![Page 1: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/1.jpg)
AMemoryModelforRISC-V
Sizhuo Zhang,Muralidaran Vijayaraghavan,Arvind
RISC-VWorkshop,November29,2016
![Page 2: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/2.jpg)
WhynotSC/TSO?
Theybothhavesimplespecifications,bothaxiomaticallyandoperationallyButsimpleimplementationshavelowperformancen Strictorderingrequirementsformemory
instructionsn Toimproveperformance,onemustmonitor
coherenceinvalidationtraffictopotentiallysquashexecutedloads
![Page 3: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/3.jpg)
WhynotPOWER/ARM?
Theiroperationalmodelsexposetoomuchmicroarchitectural detailsn Branchspeculation,OOOexecution,rollbacketc
areexposedinthememorymodelspecification!Theiraxiomaticmodelsaretoocomplexwithnowell-understoodrelationtomicroarchitecturen Onecannotsaywithconfidenceifaparticular
microarchitectural implementationobeysthemodel
![Page 4: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/4.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
![Page 5: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/5.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
![Page 6: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/6.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(1)
![Page 7: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/7.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(1)
![Page 8: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/8.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(1)
(1)
![Page 9: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/9.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(a)
(1)
(1)
![Page 10: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/10.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(a)
(1)
(1)
(1)
![Page 11: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/11.jpg)
Propertiesforanewmemorymodel
Simplespecificationwithoutmicroarchitectural detailslikeBranchspeculation,OOOexecution,rollback,etcButestablishcorrespondencetomicroarchitectureimplementationsWeakerthanSC/TSOforhighperformant,simpleimplementationsInclusionofsufficientfencestoforceSC-likebehaviorwhennecessary
![Page 12: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/12.jpg)
OurproposalforRISC-Vmemorymodel:WMM
SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …
InstantaneousMemory
SC:• Storesupdatememoryinstantly• Loadreadsmemoryinstantly
ProcessorInstantaneousInorderExecution
![Page 13: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/13.jpg)
OurproposalforRISC-Vmemorymodel:WMM
SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …
InstantaneousMemory
TSO:• Storesaredequeued inorder• Whenastoreisdequeued fromstorebuffer,itupdatesmemory
instantly
• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)memoryinstantly
StoreBuffer
Processor
StoreBuffer
InstantaneousInorderExecution
![Page 14: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/14.jpg)
OurproposalforRISC-Vmemorymodel:WMM
SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …
InstantaneousMemory
PSO:• Storesaredequeued inorderonlyforsameaddress• Whenastoreisdequeued fromstorebuffer,itupdatesmemory
instantly
• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)memoryinstantly
StoreBuffer
Processor
StoreBuffer
InstantaneousInorderExecution
![Page 15: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/15.jpg)
OurproposalforRISC-Vmemorymodel:WMM
SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …
InstantaneousMemory
WMM:• Storesaredequeued inorderonlyforsameaddress• Whenastoreisdequeued fromstorebuffer,itupdatesmemory
instantly,removesaddressfromowninvalidationbufferandenterseveryotherinvalidationbufferinstantly
• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)oldestentryininvalidationbuffer,or(ifnotpresent)memoryinstantly
• Oldestinvalidationbufferentrycanbethrownoutanytime
StoreBuffer
Processor
StoreBuffer
InvalidationBuffer
InvalidationBuffer
InstantaneousInorderExecution
![Page 16: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/16.jpg)
FencesinWMM
Acquire/ReconcileFence:ClearsInvalidationbufferRelease/CommitFence:WaitsforStorebuffertobeflushed(non-atomically)
![Page 17: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/17.jpg)
AxiomaticDefinitionofWMMMemoryorderreorderingaxiom:
Loadreadstheyounger(inmemoryorder)ofn LateststoreinmemoryorderforthataddressORn Lateststoreinprogramorder(inthatthread)forthataddress
Can Reorder?Second
Ld b Stbv’ Acq/Reconcile Rel/Commit
First
Ld a a!=b No No No
Stav Yes a!=b Yes No
Acq/Reconcile No No No No
Rel/Commit Yes No No No
St-StFence:CommitLd-Ld Fence:Reconcile
St-Ld Fence:Commit+ReconcileLd-StFence:Notneeded
![Page 18: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/18.jpg)
ImplementingWMM
Anexecutedloadwontgetsquashedlateraslongasitdoesn’tovertakeareconcileormemoryinstructiontosameaddressn Nomonitoringofcoherenceinvalidationsn Loadaddressspeculationallowed– squashedonlyif
predictedaddressiswrongAllinstructionsarecommittedinordern Storescannotovertakeloadsn Prevents“out-of-thin-air”generationofvalues
FormallyProven:OOO+ Single-threaded-correctness+ In-order-commit
+ ValuePrediction+ GlobalStoreAtomicity=WMM
![Page 19: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/19.jpg)
ImplementingWMM
Writeback coherentcachehierarchytypicallysatisfiesGlobalStoreAtomicityIfL1iswrite-through,easytoensureGlobalStoreAtomicityunlessthecoreisSMTn SMTcoreswithL1write-throughcachesimplementa“non-multicopy-atomic”
memory Don’tdoit
“Theoretically, the definition of the aq and rl bits allows for implementations without global store atomicity. When both aq and rl bits are set, however, we require full sequential consistency for the atomic operation which implies global store atomicity in addition to both acquire and release semantics. In practice, hardware systems are usually implemented with global store atomicity, embodied in local processor ordering rules together with single-writer cache coherence protocols.”
FormallyProven:OOO+ Single-threaded-correctness+ In-order-commit
+ ValuePrediction+ GlobalStoreAtomicity=WMM
![Page 20: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/20.jpg)
MappingC++11toWMMC++11 WMM
Non-atomic Load Load
Load Relaxed Load
LoadConsume Load;Acquire/Reconcile
LoadAcquire Load;Acquire/Reconcile
LoadSC Rel/Commit;Acq/Reconcile;Load;Acq/Reconcile
Non-atomicStore Store
StoreRelaxed Store
StoreRelease Release/Commit;Store
StoreSC Release/Commit;Store
UsingoperationalspecificationofWMMmakesitstraightforwardtoderive/verifythismapping
![Page 21: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/21.jpg)
Conclusion
WMMisamemorymodelwithsimplespecificationandpotentiallyhighperformantimplementationsn BlendswellwithRISC-Vphilosophyandshould
beusedasthememorymodelforRISC-V
Thankyou! [email protected]@[email protected]
Advertisement:FormallyverifiedRISCV(subsetofRV32I)multicoreimplementationinKami,ahardwareformalverificationplatform
![Page 22: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/22.jpg)
Backup
![Page 23: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/23.jpg)
WhynotRMO?
Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1
Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:
RMO’sdependencyrequirementsaretoostrict
Initiallyeverything’s0
(1)
(a)
(1)
(1)
(1)
![Page 24: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/24.jpg)
WhynotReleaseConsistency?
FencesarenotstrongenoughtogiveSequentialConsistency
Thread1 Thread2 Thread3Stval =1 Ld r1=val Ld r2=flag
Stflag =r1 Ld r3=val
Non-cumulativeFences
Initially,everythingis0
(1)
(1)
(1)
(0)
AcquireRelease
![Page 25: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/25.jpg)
Out-of-thin-airissue
Noprocessorcanproducevaluesoutofthinairn ButincompletesetofaxiomsseeminglyallowsthisInsistingonin-ordercommitsandadvertisingstoresonlyaftercommittootherthreads/processorstakescareofthisissue
Thread1 Thread2Ld r1=x Ld r2=ySty=R1 Stx=42
Initiallyeverythingis0Finallyx=y=r1=r2=42
![Page 26: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/26.jpg)
“The AMOs were designed to implement the C11 and C++11 memory models efficiently. Although the FENCE R, RW instruction suffices to implement the acquire operation and FENCE RW, W suffices to implement release, both imply additional unnecessary ordering as compared to AMOs with the corresponding aq or rl bit set.”
![Page 27: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/27.jpg)
LitmusTestsforWMM
27
TestSB
P1 P2
I1:Sta 1I2:Commit
I3:r1 =Ld b
I4:Stb1I5:Commit
I6:r2=Ld a
WMMallows:r1=0,r2=0
Monolithicmemory
P1Reg state
Storebuffer
Invbuffer
P2Reg state
Storebuffer
Invbuffer<a,1> <a,0><b,1><b,0>
Reconcile Reconcile
WMMallowsthebehavior- Ld overtakesStandCommit
AddReconciletoforbidthis
![Page 28: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/28.jpg)
LitmusTestsforWMM
28
Monolithicmemory
P1Reg state
Storebuffer
Invbuffer
P2Reg state
Storebuffer
Invbuffer<a,1>
<a,0>
<b,0>
<b,a>
WMMallowsthebehavior- Ld overtakesLd- Nodependencyordering- Canbecausedbyvaluepredictioninhardware
AddReconciletoforbidthis
Out-of-thin-airisimpossiblebecauseofI2E
TestMP+data
P1 P2
I1:Sta1I2:CommitI3:St ba
I4:r1 =Ld b
I5: r2=Ld r1
WMMallows:r1=a,r2=0
Reconcile
![Page 29: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/29.jpg)
WMM-S
ThesameabstractmachinestructureasWMMModelnon-multi-copy-atomicstoresn Makeastorefromprocessori visibletoprocessorj beforethestoreupdates
monolithicmemoryn Makeacopyofthestorefromthesb ofprocessori, andinsertthecopyintothe
sb ofprocessorjn Eachstorehasauniquetag,copieshavethesametagDequeue astorefromsb tomonolithicmemoryn Allcopiesaredequeued fromsbn AllcopieshavetobetheoldestoneforthestoreaddressintheirrespectivesbCopyingofamustbeconstrainedforper-locationSCn Eachsb ordersstoresforacertainaddressasalistn Combiningallsuchlistsfromallsb togetherformsapartialcoherenceorder
(<"#)ofthestoretagsforthataddressn Aftercopying,partialcoherenceordermustbestillacyclic
29
Stbuffer𝑠𝑏
Monolithicmemory𝑚
…
Processor𝑝𝑠[𝑖]Regstate𝑠
Inv buffer𝑖𝑏 …
![Page 30: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/30.jpg)
Storecopyexample
Currentpartialcoherenceordern 𝑡, <"# 𝑡- <"# 𝑡. and𝑡/ <"# 𝑡-n 𝑡, and𝑡/ areunrelatedIfwecopyCintosb ofP1asC’n Createcycle:𝑡. <"# 𝑡/ <"# 𝑡- <"# 𝑡.n ShouldnotbeallowedIfwecopyAintosb ofP2n Createcycle:𝑡. <"# 𝑡.
30
A:𝑡.
P1sbInsertedlater(younger)
↕Insertedearlier(older)
A’:𝑡.B:𝑡-D:𝑡,
P2sbB’:𝑡-
C:𝑡/
P3sb
C’:𝑡/
(Primesarecopies)
![Page 31: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/31.jpg)
LitmusTestsforWMM-S
31
TestWRC
P1 P2 P3
I1:Sta 1 I2:r1=Ld a
I3:Stb r1
I4:r2 =Ld bI5:ReconcileI6:r3=Ld a
WMM-Sallows:r1=1,r2=1,r3=0
<a,1>
P1sb
<a,1>
P2sb P3sb
m <b,1>
• AddCommitinP2toforbidthisbehavior
• Commitgloballyadvertisesobservedstores-- release
• Reconcilepreventsloadsfromreadingstalevalues--acquire
Commit
![Page 32: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/32.jpg)
LitmusTestsforWMM-S
32
TestIRIW
P1 P2 P3 P4
I1:Sta 1 I2:r1=Ld a
I3:ReconcileI4:r2=Ld b
I5:Stb1 I6:r3=Ld b
I7:ReconcileI8:r4=Ld a
WMM-Sallows:r1=1,r2=0,r3=1,r4=1
<a,1>
P1sb
<a,1>
P2sb
<b,1>
P3sb
m
<b,1>
P4sb
Commit Commit
![Page 33: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/33.jpg)
WMM-SImplementation
WMM-ScanbeimplementedusingOOO+non-atomicmemorysystemn e.g.memorysystemoftheARMFlowingModel(FM)[1]
n WedonotneedstorebufferinOOO,becauseFMhasbuffers
33[1]Flur etal.“ModellingtheARMv8architecture,operationally:concurrencyandISA”,POPL2016
OOOP1ROB
OOOP2ROB
OOOP3ROB
OOOP4ROB
Segment𝑠[1] Segment𝑠[2] Segment𝑠[3] Segment𝑠[4]
Segment𝑠[5] Segment𝑠[6]
MonolithicmemorymFM
![Page 34: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/34.jpg)
FM+OOO
Eachsegmentisabufferofmemoryrequestsn KeepsFIFOorderingofrequeststothesameaddressn Flowrule:Theoldestrequestforsomeaddressinasegmentcanbemovedto
theparentsegmentormonolithicmemoryn Bypassrule:Astorecanforwarditsdatatoaload,aslongasthereisnoother
requesttothesameaddressinbetweenOOOcommitn store:directlyinsertintosegmentn Commitfence:ifanysegmentcontainsastoreobservedbythecommitsofthe
OOOprocessor,thenwecannotcommitthefence
34
Seg.𝑠[1]
Seg.𝑠[5] Seg.𝑠[6]
Monolithicmemorym
P1 ROB
Seg.𝑠[2]
P2 ROB
Seg.𝑠[3]
P3 ROB
Seg.𝑠[4]
P4 ROB
Astoreobservedbycommitsof𝑃𝑖:eithercommittedby𝑃𝑖 orreturnedbyaloadcommittedby𝑃𝑖
SimplifiedversionofFM(nofenceinFM)
![Page 35: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/35.jpg)
CCM+OOO⊆WMMFM+OOO⊆WMM-S
HowWMM/WMM-SsimulatesCCM/FM+OOOn WhenthemonolithicmemoryinCCM/FMisupdatedbyastore
w WMM/WMM-Sdequeues thatstorefrom𝑠𝑏tomonolithicmemoryn WhenOOOcommitsaninstruction
w WMM/WMM-SexecutesthatinstructionWhenOOOPicommitsaloadLforaddressa withresultvn Considerwhereisv inCCM/FM+OOOwhenLcommits
n v isinmonolithicmemoryofCCMw WMMexecutesLbyreadingmonolithicmemory
n v hasbeenoverwrittenbyanotherstoreinmonolithicmemoryw WMMhaspreviouslyinserted<a,v>intoib ofps[i]w NowWMMcanexecuteLbyreadingib
n visinstorebufferofOOOPiw IfvhasbeenobservedbycommitsofPibeforeLiscommitted,thenWMM/WMM-Scan
executeLbyreadinglocalsbw Otherwise,WMM-Sfirescopy<a,v>intolocalsb andletLreadit
35
![Page 36: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/36.jpg)
ImpactofDisallowingLd-StReordering
36
Qualitativeanalysisn Storebuffercanalreadyhidethestoremisslatencyn Storesarenotonthecriticalpathforsingle-threadperformancen Inextremecases,thespeculativestorequeuemaybefilledupwith
uncommittedstores
Quantitativeevaluationn Simulate8-coremultiprocessorusingESESCsimulatorn RunSPLASH2xbenchmarksn CompareWMM,Alpha,andaggressiveimplementationsofSCand
TSOn Alpha=WMM+Ld-Streordering
w TrytofindyoungerstorestocommitwhentheinstructionatthecommitslotoftheROBcannotcommit
![Page 37: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/37.jpg)
SimulationConfiguration
37
![Page 38: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/38.jpg)
Results
38
NormalizedexecutiontimeanditsbreakdownatthecommitslotofROB
NormalizedexecutiontimeanditsbreakdownattheissueporttoROB
AveragecyclestocommitstoresearlyinAlpha
![Page 39: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer](https://reader034.fdocuments.us/reader034/viewer/2022052009/601f106e81e8706fcd341bce/html5/thumbnails/39.jpg)
Non-AtomicMemory
Modelsfornon-atomicmemoryismorecomplicatedWeareunclearabouttheperformanceadvantageofnon-atomicmemoryn Becauseourunderstandingofthemicroarchitectural sourcesfornon-atomic
memoryislimitedn POWER:sharedwrite-throughL1duetoSMT
w OthersourcesinthehierarchystartingfromL2?n ARM:noclue
w Manylitmustestsfornon-atomicstoresarenotobservableonhardwarew WRC+addrs,WWC+addrs,IRIW+addrs (http://diy.inria.fr/cats/model-
arm/all.html)w WRC+addrs (http://www.cl.cam.ac.uk/~sf502/popl16/observations.pdf)
Onlybyunderstandingthemicroarchitectural reasonsfornon-atomicmemory,areweabletoanalyzethebenefitofit
39