A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state...

39
A Memory Model for RISC-V Sizhuo Zhang, Muralidaran Vijayaraghavan, Arvind RISC-V Workshop, November 29, 2016

Transcript of A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state...

Page 1: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

AMemoryModelforRISC-V

Sizhuo Zhang,Muralidaran Vijayaraghavan,Arvind

RISC-VWorkshop,November29,2016

Page 2: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotSC/TSO?

Theybothhavesimplespecifications,bothaxiomaticallyandoperationallyButsimpleimplementationshavelowperformancen Strictorderingrequirementsformemory

instructionsn Toimproveperformance,onemustmonitor

coherenceinvalidationtraffictopotentiallysquashexecutedloads

Page 3: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotPOWER/ARM?

Theiroperationalmodelsexposetoomuchmicroarchitectural detailsn Branchspeculation,OOOexecution,rollbacketc

areexposedinthememorymodelspecification!Theiraxiomaticmodelsaretoocomplexwithnowell-understoodrelationtomicroarchitecturen Onecannotsaywithconfidenceifaparticular

microarchitectural implementationobeysthemodel

Page 4: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

Page 5: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

Page 6: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(1)

Page 7: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(1)

Page 8: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(1)

(1)

Page 9: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(a)

(1)

(1)

Page 10: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(a)

(1)

(1)

(1)

Page 11: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Propertiesforanewmemorymodel

Simplespecificationwithoutmicroarchitectural detailslikeBranchspeculation,OOOexecution,rollback,etcButestablishcorrespondencetomicroarchitectureimplementationsWeakerthanSC/TSOforhighperformant,simpleimplementationsInclusionofsufficientfencestoforceSC-likebehaviorwhennecessary

Page 12: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

OurproposalforRISC-Vmemorymodel:WMM

SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …

InstantaneousMemory

SC:• Storesupdatememoryinstantly• Loadreadsmemoryinstantly

ProcessorInstantaneousInorderExecution

Page 13: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

OurproposalforRISC-Vmemorymodel:WMM

SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …

InstantaneousMemory

TSO:• Storesaredequeued inorder• Whenastoreisdequeued fromstorebuffer,itupdatesmemory

instantly

• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)memoryinstantly

StoreBuffer

Processor

StoreBuffer

InstantaneousInorderExecution

Page 14: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

OurproposalforRISC-Vmemorymodel:WMM

SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …

InstantaneousMemory

PSO:• Storesaredequeued inorderonlyforsameaddress• Whenastoreisdequeued fromstorebuffer,itupdatesmemory

instantly

• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)memoryinstantly

StoreBuffer

Processor

StoreBuffer

InstantaneousInorderExecution

Page 15: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

OurproposalforRISC-Vmemorymodel:WMM

SimpleoperationalspecificationlikeSC,TSO,PSOProcessor …

InstantaneousMemory

WMM:• Storesaredequeued inorderonlyforsameaddress• Whenastoreisdequeued fromstorebuffer,itupdatesmemory

instantly,removesaddressfromowninvalidationbufferandenterseveryotherinvalidationbufferinstantly

• Loadreadstheyoungeststorefromstorebuffer,or(ifnotpresent)oldestentryininvalidationbuffer,or(ifnotpresent)memoryinstantly

• Oldestinvalidationbufferentrycanbethrownoutanytime

StoreBuffer

Processor

StoreBuffer

InvalidationBuffer

InvalidationBuffer

InstantaneousInorderExecution

Page 16: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

FencesinWMM

Acquire/ReconcileFence:ClearsInvalidationbufferRelease/CommitFence:WaitsforStorebuffertobeflushed(non-atomically)

Page 17: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

AxiomaticDefinitionofWMMMemoryorderreorderingaxiom:

Loadreadstheyounger(inmemoryorder)ofn LateststoreinmemoryorderforthataddressORn Lateststoreinprogramorder(inthatthread)forthataddress

Can Reorder?Second

Ld b Stbv’ Acq/Reconcile Rel/Commit

First

Ld a a!=b No No No

Stav Yes a!=b Yes No

Acq/Reconcile No No No No

Rel/Commit Yes No No No

St-StFence:CommitLd-Ld Fence:Reconcile

St-Ld Fence:Commit+ReconcileLd-StFence:Notneeded

Page 18: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

ImplementingWMM

Anexecutedloadwontgetsquashedlateraslongasitdoesn’tovertakeareconcileormemoryinstructiontosameaddressn Nomonitoringofcoherenceinvalidationsn Loadaddressspeculationallowed– squashedonlyif

predictedaddressiswrongAllinstructionsarecommittedinordern Storescannotovertakeloadsn Prevents“out-of-thin-air”generationofvalues

FormallyProven:OOO+ Single-threaded-correctness+ In-order-commit

+ ValuePrediction+ GlobalStoreAtomicity=WMM

Page 19: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

ImplementingWMM

Writeback coherentcachehierarchytypicallysatisfiesGlobalStoreAtomicityIfL1iswrite-through,easytoensureGlobalStoreAtomicityunlessthecoreisSMTn SMTcoreswithL1write-throughcachesimplementa“non-multicopy-atomic”

memory Don’tdoit

“Theoretically, the definition of the aq and rl bits allows for implementations without global store atomicity. When both aq and rl bits are set, however, we require full sequential consistency for the atomic operation which implies global store atomicity in addition to both acquire and release semantics. In practice, hardware systems are usually implemented with global store atomicity, embodied in local processor ordering rules together with single-writer cache coherence protocols.”

FormallyProven:OOO+ Single-threaded-correctness+ In-order-commit

+ ValuePrediction+ GlobalStoreAtomicity=WMM

Page 20: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

MappingC++11toWMMC++11 WMM

Non-atomic Load Load

Load Relaxed Load

LoadConsume Load;Acquire/Reconcile

LoadAcquire Load;Acquire/Reconcile

LoadSC Rel/Commit;Acq/Reconcile;Load;Acq/Reconcile

Non-atomicStore Store

StoreRelaxed Store

StoreRelease Release/Commit;Store

StoreSC Release/Commit;Store

UsingoperationalspecificationofWMMmakesitstraightforwardtoderive/verifythismapping

Page 21: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Conclusion

WMMisamemorymodelwithsimplespecificationandpotentiallyhighperformantimplementationsn BlendswellwithRISC-Vphilosophyandshould

beusedasthememorymodelforRISC-V

Thankyou! [email protected]@[email protected]

Advertisement:FormallyverifiedRISCV(subsetofRV32I)multicoreimplementationinKami,ahardwareformalverificationplatform

Page 22: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Backup

Page 23: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotRMO?

Thread1 Thread2Sta=1 Ld r1=bMEMBAR Branchr1!=1goto exitStb=1 Stc=1

Ld r2=cr3=a+r2- 1Ld r4=[r3]exit:

RMO’sdependencyrequirementsaretoostrict

Initiallyeverything’s0

(1)

(a)

(1)

(1)

(1)

Page 24: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WhynotReleaseConsistency?

FencesarenotstrongenoughtogiveSequentialConsistency

Thread1 Thread2 Thread3Stval =1 Ld r1=val Ld r2=flag

Stflag =r1 Ld r3=val

Non-cumulativeFences

Initially,everythingis0

(1)

(1)

(1)

(0)

AcquireRelease

Page 25: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Out-of-thin-airissue

Noprocessorcanproducevaluesoutofthinairn ButincompletesetofaxiomsseeminglyallowsthisInsistingonin-ordercommitsandadvertisingstoresonlyaftercommittootherthreads/processorstakescareofthisissue

Thread1 Thread2Ld r1=x Ld r2=ySty=R1 Stx=42

Initiallyeverythingis0Finallyx=y=r1=r2=42

Page 26: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

“The AMOs were designed to implement the C11 and C++11 memory models efficiently. Although the FENCE R, RW instruction suffices to implement the acquire operation and FENCE RW, W suffices to implement release, both imply additional unnecessary ordering as compared to AMOs with the corresponding aq or rl bit set.”

Page 27: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

LitmusTestsforWMM

27

TestSB

P1 P2

I1:Sta 1I2:Commit

I3:r1 =Ld b

I4:Stb1I5:Commit

I6:r2=Ld a

WMMallows:r1=0,r2=0

Monolithicmemory

P1Reg state

Storebuffer

Invbuffer

P2Reg state

Storebuffer

Invbuffer<a,1> <a,0><b,1><b,0>

Reconcile Reconcile

WMMallowsthebehavior- Ld overtakesStandCommit

AddReconciletoforbidthis

Page 28: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

LitmusTestsforWMM

28

Monolithicmemory

P1Reg state

Storebuffer

Invbuffer

P2Reg state

Storebuffer

Invbuffer<a,1>

<a,0>

<b,0>

<b,a>

WMMallowsthebehavior- Ld overtakesLd- Nodependencyordering- Canbecausedbyvaluepredictioninhardware

AddReconciletoforbidthis

Out-of-thin-airisimpossiblebecauseofI2E

TestMP+data

P1 P2

I1:Sta1I2:CommitI3:St ba

I4:r1 =Ld b

I5: r2=Ld r1

WMMallows:r1=a,r2=0

Reconcile

Page 29: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WMM-S

ThesameabstractmachinestructureasWMMModelnon-multi-copy-atomicstoresn Makeastorefromprocessori visibletoprocessorj beforethestoreupdates

monolithicmemoryn Makeacopyofthestorefromthesb ofprocessori, andinsertthecopyintothe

sb ofprocessorjn Eachstorehasauniquetag,copieshavethesametagDequeue astorefromsb tomonolithicmemoryn Allcopiesaredequeued fromsbn AllcopieshavetobetheoldestoneforthestoreaddressintheirrespectivesbCopyingofamustbeconstrainedforper-locationSCn Eachsb ordersstoresforacertainaddressasalistn Combiningallsuchlistsfromallsb togetherformsapartialcoherenceorder

(<"#)ofthestoretagsforthataddressn Aftercopying,partialcoherenceordermustbestillacyclic

29

Stbuffer𝑠𝑏

Monolithicmemory𝑚

Processor𝑝𝑠[𝑖]Regstate𝑠

Inv buffer𝑖𝑏 …

Page 30: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Storecopyexample

Currentpartialcoherenceordern 𝑡, <"# 𝑡- <"# 𝑡. and𝑡/ <"# 𝑡-n 𝑡, and𝑡/ areunrelatedIfwecopyCintosb ofP1asC’n Createcycle:𝑡. <"# 𝑡/ <"# 𝑡- <"# 𝑡.n ShouldnotbeallowedIfwecopyAintosb ofP2n Createcycle:𝑡. <"# 𝑡.

30

A:𝑡.

P1sbInsertedlater(younger)

↕Insertedearlier(older)

A’:𝑡.B:𝑡-D:𝑡,

P2sbB’:𝑡-

C:𝑡/

P3sb

C’:𝑡/

(Primesarecopies)

Page 31: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

LitmusTestsforWMM-S

31

TestWRC

P1 P2 P3

I1:Sta 1 I2:r1=Ld a

I3:Stb r1

I4:r2 =Ld bI5:ReconcileI6:r3=Ld a

WMM-Sallows:r1=1,r2=1,r3=0

<a,1>

P1sb

<a,1>

P2sb P3sb

m <b,1>

• AddCommitinP2toforbidthisbehavior

• Commitgloballyadvertisesobservedstores-- release

• Reconcilepreventsloadsfromreadingstalevalues--acquire

Commit

Page 32: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

LitmusTestsforWMM-S

32

TestIRIW

P1 P2 P3 P4

I1:Sta 1 I2:r1=Ld a

I3:ReconcileI4:r2=Ld b

I5:Stb1 I6:r3=Ld b

I7:ReconcileI8:r4=Ld a

WMM-Sallows:r1=1,r2=0,r3=1,r4=1

<a,1>

P1sb

<a,1>

P2sb

<b,1>

P3sb

m

<b,1>

P4sb

Commit Commit

Page 33: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

WMM-SImplementation

WMM-ScanbeimplementedusingOOO+non-atomicmemorysystemn e.g.memorysystemoftheARMFlowingModel(FM)[1]

n WedonotneedstorebufferinOOO,becauseFMhasbuffers

33[1]Flur etal.“ModellingtheARMv8architecture,operationally:concurrencyandISA”,POPL2016

OOOP1ROB

OOOP2ROB

OOOP3ROB

OOOP4ROB

Segment𝑠[1] Segment𝑠[2] Segment𝑠[3] Segment𝑠[4]

Segment𝑠[5] Segment𝑠[6]

MonolithicmemorymFM

Page 34: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

FM+OOO

Eachsegmentisabufferofmemoryrequestsn KeepsFIFOorderingofrequeststothesameaddressn Flowrule:Theoldestrequestforsomeaddressinasegmentcanbemovedto

theparentsegmentormonolithicmemoryn Bypassrule:Astorecanforwarditsdatatoaload,aslongasthereisnoother

requesttothesameaddressinbetweenOOOcommitn store:directlyinsertintosegmentn Commitfence:ifanysegmentcontainsastoreobservedbythecommitsofthe

OOOprocessor,thenwecannotcommitthefence

34

Seg.𝑠[1]

Seg.𝑠[5] Seg.𝑠[6]

Monolithicmemorym

P1 ROB

Seg.𝑠[2]

P2 ROB

Seg.𝑠[3]

P3 ROB

Seg.𝑠[4]

P4 ROB

Astoreobservedbycommitsof𝑃𝑖:eithercommittedby𝑃𝑖 orreturnedbyaloadcommittedby𝑃𝑖

SimplifiedversionofFM(nofenceinFM)

Page 35: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

CCM+OOO⊆WMMFM+OOO⊆WMM-S

HowWMM/WMM-SsimulatesCCM/FM+OOOn WhenthemonolithicmemoryinCCM/FMisupdatedbyastore

w WMM/WMM-Sdequeues thatstorefrom𝑠𝑏tomonolithicmemoryn WhenOOOcommitsaninstruction

w WMM/WMM-SexecutesthatinstructionWhenOOOPicommitsaloadLforaddressa withresultvn Considerwhereisv inCCM/FM+OOOwhenLcommits

n v isinmonolithicmemoryofCCMw WMMexecutesLbyreadingmonolithicmemory

n v hasbeenoverwrittenbyanotherstoreinmonolithicmemoryw WMMhaspreviouslyinserted<a,v>intoib ofps[i]w NowWMMcanexecuteLbyreadingib

n visinstorebufferofOOOPiw IfvhasbeenobservedbycommitsofPibeforeLiscommitted,thenWMM/WMM-Scan

executeLbyreadinglocalsbw Otherwise,WMM-Sfirescopy<a,v>intolocalsb andletLreadit

35

Page 36: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

ImpactofDisallowingLd-StReordering

36

Qualitativeanalysisn Storebuffercanalreadyhidethestoremisslatencyn Storesarenotonthecriticalpathforsingle-threadperformancen Inextremecases,thespeculativestorequeuemaybefilledupwith

uncommittedstores

Quantitativeevaluationn Simulate8-coremultiprocessorusingESESCsimulatorn RunSPLASH2xbenchmarksn CompareWMM,Alpha,andaggressiveimplementationsofSCand

TSOn Alpha=WMM+Ld-Streordering

w TrytofindyoungerstorestocommitwhentheinstructionatthecommitslotoftheROBcannotcommit

Page 37: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

SimulationConfiguration

37

Page 38: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Results

38

NormalizedexecutiontimeanditsbreakdownatthecommitslotofROB

NormalizedexecutiontimeanditsbreakdownattheissueporttoROB

AveragecyclestocommitstoresearlyinAlpha

Page 39: A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

Non-AtomicMemory

Modelsfornon-atomicmemoryismorecomplicatedWeareunclearabouttheperformanceadvantageofnon-atomicmemoryn Becauseourunderstandingofthemicroarchitectural sourcesfornon-atomic

memoryislimitedn POWER:sharedwrite-throughL1duetoSMT

w OthersourcesinthehierarchystartingfromL2?n ARM:noclue

w Manylitmustestsfornon-atomicstoresarenotobservableonhardwarew WRC+addrs,WWC+addrs,IRIW+addrs (http://diy.inria.fr/cats/model-

arm/all.html)w WRC+addrs (http://www.cl.cam.ac.uk/~sf502/popl16/observations.pdf)

Onlybyunderstandingthemicroarchitectural reasonsfornon-atomicmemory,areweabletoanalyzethebenefitofit

39