Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux &...
Transcript of Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux &...
![Page 1: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/1.jpg)
Celerity:AnOpenSource511-coreRISC-VTieredAcceleratorFabric
Prof.MichaelTaylorBespokeSiliconGroup
UniversityofWashingtonhttp://www.opencelerity.org
![Page 2: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/2.jpg)
OutlineTheCelerityOpenSourceRISC-VTieredAcceleratorFabric:FastArchitecturalDesignMethodologiesforFastChips
BaseJump:DesigningtheDNAforOpenSourceASICs
BaseJump Manycore:“OpenSourceforGPU”
http://www.opencelerity.org
http://www.bjump.org
http://www.bjump.org/manycore
![Page 3: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/3.jpg)
TheCelerityOpenSourceRISC-VTieredAcceleratorFabric:FastArchitecturalDesignMethodologiesforFastChips
RitchieZhao,ChunZhao,ShaolinXie,Bandhav Veluri,LuisVega, ChristopherTorng,Ningxiao Sun,AustinRovinski,AnujRao,Gai Liu,PaulGao,ScottDavidson,
SteveDai,Aporva Amarnath,KhalidAl-Hawaj,TutuAjayi
ChristopherBatten,RonaldG.Dreslinski,RajeshK.Gupta,MichaelB.Taylor,Zhiru Zhang
![Page 4: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/4.jpg)
BuildingwiththeRISC-VSoftware/HardwareEcosystem
Celerity::Introduction
SoftwareToolchain• Acomplete,off-the-shelfsoftwarestack(e.g.,binutils,GCC,newlib/glibc,Linuxkernel&distros)forbothembeddedandgeneral-purpose
Architecture• RISC-VISAspecificationdesignedtobebothmodularandextensible,withasmallbaseISAandoptionalextensions
Microarchitecture• On-chipnetworkspecificationsandimplementations(NASTI,TileLink)• RISC-Vprocessorimplementationsforbothin-order(BerkeleyRocket)andout-of-order(BerkeleyBOOM)cores
PhysicalDesign• PreviousspinsofchipsforreferenceTesting• Standardcoreverificationtestsuites+Turn-keyFPGAgateware
ApplicationAlgorithm
Operating System
Instruction Set Architecture
Register-Transfer Level
Circuits
Programming Language
Compilers
Microarchitecture
Gate-Level
TechnologyDevices
![Page 5: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/5.jpg)
TheCeleritySystem-on-Chip
Celerity, anaccelerator-centricSoCwithatieredacceleratorfabric
thattargets highlyperformantandenergy-efficientembeddedsystems
FundedbytheDARPACRAFTprogram,“CircuitRealizationAtFasterTimescales”
Thegoalwastodevelopnewmethodologiestodesignchipsmorequickly
Celerity::Introduction
General-PurposeTier
ManycoreTier
SpecializationTier
WeleveragedtheRISC-Vsoftware/hardwareecosystem as webuiltCelerity,andwebelieveitwasinstrumentalinenablingateamof20graduatestudentstotapeoutacomplexSoCinonly9months
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
BaseJumpF
SBand
Motherboard
![Page 6: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/6.jpg)
Celerity:ChipOverview
• TSMC16nmFFC• 25mm2 diearea(5mmx5mm)• ~385milliontransistors• 511RISC-Vcores
• 5Linux-capableRV64GBerkeleyRocketcores• 496-core RV32IM meshtiledarray“manycore”• 10-coreRV32IMmeshtiledarray(lowvoltage)
• Binarized NeuralNetworkSpecializedAccelerator• On-chipsynthesizablePLLsandDC/DCLDO
• Developedin-house• 3Clockdomains
• 400MHz– DDRI/O• 625MHz– Rocketcore+Specializedaccelerator• 1.05GHz– Manycorearray
• 672-pinflipchipBGApackage• 9-monthsfromPDKaccesstotape-out
Celerity::Introduction
http://www.opencelerity.org
![Page 7: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/7.jpg)
Agenda
• Introduction• ForeachTier:
• Whatdidwebuild?• Howdidwebuildit?• RISC-VEcosystemSuccesses• RISC-VEcosystemChallenges
• Conclusion
Celerity::Introduction
General-PurposeTier
ManycoreTier
SpecializationTier
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
BaseJumpF
SBand
Motherboard
![Page 8: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/8.jpg)
Celerity:General-Purpose Tier
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V• ChallengeswithRISC-V
General-PurposeTier
ManycoreTier
SpecializationTier
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
BaseJumpF
SBand
Motherboard
![Page 9: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/9.jpg)
General-PurposeTierOverview
• 5 BerkeleyRocketCores(RV64G)• Workload
• General-purposecompute• Operatingsystem(e.g.Linux&TCP/IPStack)• InterruptandExceptionhandling• Programdispatchandcontrolflow
• Interface• Interfacetooff-chipI/Oandotherperipherals• 4Coresconnecttothemanycore array• 1CoreinterfaceswiththeBNN
• Memory• Eachcoreexecutesindependentlywithinitsownaddressspace
• Memorymanagementforalltiers
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
Man
ycore
BNN
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
BaseJumpF
SB
BaseJumpM
otherboard
![Page 10: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/10.jpg)
Berkeley RocketCores
• 5BerkeleyRocketCores(https://github.com/freechipsproject/rocket-chip)
• GeneratedfromChisel• RV64GISA• 5-stage,in-order,scalarprocessor• Double-precisionfloatingpoint• I-Cache:16KB4-wayassoc.• D-Cache:16KB4-wayassoc.
• PhysicalImplementation• ~900 MHz• 5corespermm2
http://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?• Successeswith RISC-V•ChallengeswithRISC-V
![Page 11: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/11.jpg)
DesignIterations2.Alpaca
3.Bison 4.Coyote
1.LoopbackB
aseJ
ump
Mot
herb
oard
Bas
eJum
pFS
B Loopback FIFO
Bas
eJum
pM
othe
rboa
rd
Bas
eJum
pFS
B
NA
STI RISC-V Rocket Core
I-CacheD-CacheN
AST
I RISC-V Rocket Core
I-CacheD-Cache
RoCC AcceleratorBas
eJum
pM
othe
rboa
rd
Bas
eJum
pFS
B
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?• SuccesseswithRISC-V•ChallengeswithRISC-V
ImplementedNASTIbridgeandconnectedrocketcoreBaselinedesigntovalidateFSBandNorthbridge
ImplementedacceleratorconnectedthroughBlackboxed RoCC ModularizedRoCCinterfacetoaccelerator
Bas
eJum
pM
othe
rboa
rd
Bas
eJum
pFS
B NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
Accelerator
… …
![Page 12: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/12.jpg)
BaseJump Motherboard Celerity SoC
Off-ChipInterfaceandNorthbridge
• Open-sourceBaseJumpIPLibrary• http://bjump.org
• FrontSidebus• BaseJumpCommunicationLink• HighSpeed(DDR)Source-SynchronousCommunicationInterface
• Packaging• ModifiedBaseJumpBGAPackageandI/ORing
• Validation• BaseJumpSuperTroublePCB (DaughterCard)• BaseJumpMotherboard(ZedBoard)
DRAM Controller
Ethernet
SSD
L2 $
JTAG
Bas
eJum
pFS
B &
FPG
A B
ridge
NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
NA
STI RISC-V Rocket Core
I-CacheD-Cache RoC
C
Bas
eJum
pFP
GA
Brid
ge
Clocks
...
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 13: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/13.jpg)
RISC-VSuccesses
• BerkeleyRocketCores• Veryquicklygeneratedvalidateddesigns• Vibrantecosystemtoprovidefeedbackandsupport• TestandValidationinfrastructure• SoftwareandToolchainsupport
• FlexiblememorysystemandperipheralI/Osupport• EasyintegrationwithBaseJump IPLibrary
• Balancesextensibilityandsoftwaresupport
Celerity::General-PurposeTier::Whatisit?• Howdidwebuildit?• Successeswith RISC-V•ChallengeswithRISC-V
![Page 14: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/14.jpg)
OutlineTheCelerityOpenSourceRISC-VTieredAcceleratorFabric:FastArchitecturesandDesignMethodologiesforFastChips
BaseJump:DesigningtheDNAforOpenSourceASICs
BaseJump Manycore:“OpenSourceforGPU”
http://www.opencelerity.org
http://www.bjump.org
http://www.bjump.org/manycore
![Page 15: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/15.jpg)
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
BaseJumpF
SBand
Motherboard
Celerity:Manycore Tier (BaseJump Manycore)
General-PurposeTier
ManycoreTier
SpecializationTierDevelopedbyTaylor’s
BespokeSiliconGroup@UWCelerity::Manycore Tier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
http://bjump.org/manycore
![Page 16: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/16.jpg)
BaseJump Manycore architecture
TheVanillacore: SimplebutefficienttorunCcodewithoutanytoolchainmodification
• ISA:RV32IM• Pipeline:5-stage,fullyforwarded,in-order,singleissue
• Scratchpadmemory:4KBforIMem,4KBforDMem
• SecondTape-outofthistiledarchitecture(10-core)
...
… … …...
...
...
...
… … …
NOCRouter
RISC-V Core
MEM
C
ross
bar
DMEM
IMEM
496RISC-VCores
Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 17: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/17.jpg)
BaseJump Manycore MeshNetwork• 80-bitForwardLinks
• Single-flit• <XY_dest,XY_src,data>• ParameterizedFieldSizes
• 10-bitReverseLinks• RoutesXY_src backtosrc.Allowsfences.
• Router• SimpleXY-dimensionrouting• 2-elsbufferingperinputport.• Novirtualchannels.• Tiny• In-orderdelivery• DeadlockFree
17Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
...
...
...
Forward packetForward responseReverse packetReverse response
bufferedrouter
tilelink protocol
![Page 18: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/18.jpg)
Manycore LinkstoGeneral-PurposeandSpecializedTier
CrossClockDomaininterface• ToGeneral-PurposeTier:ConvertRoCCtolink
protocol,supportconfiguringDMA,writeandresetmanycore etc.
• ToSpecializedTier:Aggregatelinkinterfacetoincreasethebandwidthandthroughput
Asy
nc F
IFO
Endp
oint
DM
A
L1D Cache
Core
req
resp
cmdrespbusy
link_to_rocc Router
...
… … …...
...
...
...
… … …
Rocket
Rocket
Rocket
Rocket
RoCC
RoCC
RoCC
RoCC
General-Purpose Tier clock domain
Manycoreclock domain
Specialized Tier clock domain
Asy
nc
FIFO
Asy
nc
FIFO
Asy
nc
FIFO
Asy
nc
FIFO
32
32
32
32
64
64
64
Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
Cross ClkDomain
Cross Clk Domain
![Page 19: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/19.jpg)
BaseJump Manycore ProgrammingSupportforStreaming
Producer-consumerprogrammingmodel:extendedinstructionsforefficientinter-tilesynchronization
• LoadReserved(lr.w):loadvalueandsetthereservationaddress
• Load-on-broken-reservation(lr.lbr):stallifthereservedaddresswasn’twrittenbyothercores
• Consumer: waiton<address,value>• Benefits:Nopolling,nointerrupt,fastresponse,stalledpipelinecansavepower
InputSplit Join
Feedback
Pipeline
Output
Producer-consumer Programming
DMEM
Core A Core B
NoC
Remote store
Reserved Address
Invoke pipelineStalled Pipeline
waiting for events
Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 20: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/20.jpg)
BaseJump Manycore:CurrentEfforts
CUDAFastPortingSupportforCUDACodePortingExistingGPULibrariesandPrimitives
FocusonEmbedded;exploitlocalityofcoresratherthanrelyingonexternalstorageinGDDR5
HappytoprovidesimulationimagesonAmazonF1forthosewhowishtocollaborateonprogrammingmodels.
![Page 21: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/21.jpg)
PhysicalThreadDensity Comparison
[1]J.Balkind,etal.“OpenPiton:AnOpenSourceManycoreResearchFramework,”intheInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS),2016.[2]R.Balasubramanian,etal."EnablingGPGPULow-LevelHardwareExplorationswithMIAOW:AnOpen-SourceRTLImplementationofaGPGPU,"inACMTransactionsonArchitectureandCodeOptimization(TACO). 12.2(2015):21.
Configuration NormalizedArea(32nm)
AreaRatio
CelerityTile@16nm
D-MEM=4KBI-MEM=4KB
0.024*(32/16)2=0.096mm2 1x
OpenPitonTile@32nm
L1D-Cache=8KBL1I-Cache=16KB
L1.5/L2Cache=72KB1.17mm2 [1] 12x
RawTile@180nm
L1D-Cache=32KBL1I-SRAM=96KB
16.0*(32/180)2=0.506mm2 5.25x
MIAOWGPUComputeUnitLane
@32nm
VRF=256KBSRF=2KB
15.0/16=0.938mm2[2] 9.75x
Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
• Timing:1.05GHz@16nm• Area:42corespermm^2
NormalizedPhysicalThreads(ALUops)perArea
![Page 22: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/22.jpg)
HowdidwebuildtheManycore tier?
BasejumpSTLlibrary
Dataflow
NoC
Arithmetic
…
RISC-Vtoolchain
AssemblyTestSuite
ModifiedRuntime
CCompiler
…OpenSource
Design Testing
I Mem D MemRF
ReplicatedHard-macro
One tile
SizeComparison
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
Celerity::Manycore Tier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
HierarchicalFlow
1tile
1Die
1Rocket
1BNN
OpenSource
![Page 23: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/23.jpg)
Manycore TileEfficiencyAnalysis
Tile:• VanillaCore(pipeline,ALU,MUL,RF)
• Router• RF(2R1W)• 4KBIMEM,4KBDMEM(1RW)
• Timing:1.05GHz@16nm• Area:0.024mm2 @16nm• UtilizationRatio:90%
23
CellAreaBreakdown:Memoryaccountsfortwo-thirdsofthearea
Celerity::Manycore Tier::Whatisit? • Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 24: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/24.jpg)
Celerity:SpecializationTier
General-PurposeTier
ManycoreTier
SpecializationTier
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-CacheN
AST
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
NA
STI
RoC
CRISC-V Rocket Core
I-CacheD-Cache
RISC-VVanilla-5
Core
I Mem
XB
AR
NoC
Router
D Mem
BaseJumpF
SBand
Motherboard
![Page 25: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/25.jpg)
CaseStudy:MappingFlexibleImageRecognitiontoaTieredAcceleratorFabric
Threestepstomapapplicationstotieredacceleratorfabric:Step1. Implementthealgorithmusingthegeneral-purposetierStep2. AcceleratethealgorithmusingeithertheManycore tier
OR thespecializationtierStep3. Improveperformancebycooperativelyusingboththe
specializationAND theManycore tier
Convolution Pooling Convolution Pooling Fully-connected
bird(0.02)boat(0.94)
cat(0.04)dog(0.01)
Manycore Tier
Specialization Tier
General-PurposeTier
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 26: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/26.jpg)
Off-Ch
ipI/O
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
Step2:ApplicationtoAcceleratorGeneral-PurposeTierforWeightStorage
• The BNN specialized accelerator can use one of the Rocket cores’ caches to load every layer’s weights
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 27: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/27.jpg)
Step3:AssistingAcceleratorsManycore TierforWeightStorage
Off-Ch
ipI/O
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
• The BNN specialized accelerator can use one of the Rocket cores’ caches to load every layer’s weights
• Each core in the Manycore tier executes a remote-load-store program to orchestrate sending weights to the specialization tier via a hardware FIFO
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 28: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/28.jpg)
Step3:AssistingAcceleratorsManycore TierforWeightStorage
Off-Ch
ipI/O
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
• The BNN specialized accelerator can use one of the Rocket cores’ caches to load every layer’s weights
• Each core in the Manycore tier executes a remote-load-store program to orchestrate sending weights to the specialization tier via a hardware FIFO
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 29: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/29.jpg)
Step3:AssistingAcceleratorsManycore TierforWeightStorage
Off-Ch
ipI/O
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
AX
I
RoC
CRISC-V Rocket Core
I-CacheD-Cache
• The BNN specialized accelerator can use one of the Rocket cores’ caches to load every layer’s weights
• Each core in the Manycore tier executes a remote-load-store program to orchestrate sending weights to the specialization tier via a hardware FIFO
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 30: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/30.jpg)
PerformanceBenefitsofCooperativelyUsingtheManycore andtheSpecializationTiers
General-Purpose Tier Software implementation assuming ideal performance estimated with an optimistic one instruction per cycle
Specialization Tier Full-system RTL simulation of the BNN specialized accelerator running with a frequency of 625 MHz
Specialization + ManycoreTiers
Full-system RTL simulation of the BNN specialized accelerator with the weights being streamed from the manycore
General-Purpose Tier Specialization Tier Specialization + Manycore Tiers
Runtime per Image (ms) 4,024 20 3.3
Power (Watts) 0.2 – 0.5 0.2 – 0.5 0.5 – 2.0
Improvement in Perf / Power 1x ~200x ~400x
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 31: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/31.jpg)
DesignMethodology
SystemCConstraints
StratusHLS
RTL
PyMTL
Wrappers&Adapters
FinalRTL
void bnn::dma_req() {while( 1 ) {DmaMsg msg = dma_req.get();
for ( int i = 0; i < msg.len; i++ ) {HLS_PIPELINE_LOOP( HARD_STALL, 1 );
int req_type = 0;word_t data = 0;addr_t addr = msg.base + i*8;
if ( type == DMA_TYPE_WRITE ) {data = msg.data;req_type = MemReqMsg::WRITE;} else {req_type = MemReqMsg::READ;}
memreq.put(MemReqMsg(req_type,addr,data));}
dma_resp.put(DMA_REQ_DONE);}}
IncludingRoCCInterfaces
Celerity::SpecializationTier::Whatisit?• Howdidwebuildit?•SuccesseswithRISC-V•ChallengeswithRISC-V
![Page 32: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/32.jpg)
TheCeleritySystem-on-Chip
Celerity, anaccelerator-centricSoCwithatieredacceleratorfabricthat
targetshighlyperformantandenergy-efficientembeddedsystems
Celerity’s goal wastodevelopnewmethodologiestodesignchipsmorequickly
WebelievetheRISC-Vsoftware/hardwareecosystem wasinstrumentalinenablingateamof20graduatestudents
totapeoutacomplexSoC inonly9months
General-PurposeTier
ManycoreTier
SpecializationTier
Wethankthemanycontributorstotheopen-sourceRISC-VsoftwareandhardwareecosystemwithspecialthankstoU.C.
BerkeleyforformingtheRISC-Vecosystem
Celerity::Conclusion
Acknowledgements:DARPA,undertheCRAFTprogram
SpecialthankstoDr.LintonSalmonforprogramsupportandcoordination
http://www.opencelerity.org
![Page 33: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/33.jpg)
AcceleratingASICDesignThroughReuse
• Basejump:Open-sourcepolymorphicHWcomponents• Designlibraries:BSGIPCores,BGAPackage,I/OPadRing• Testinfrastructure:DoubleTroublePCB,RealTroublePCB• Availableatbjump.org
• RISC-V:Open-sourceISA• Rocketcore:highperformanceRV64Gin-ordercore• Vanilla-5:highefficiencyRV32IMin-ordercore
• RoCC:Open-sourceon-chipinterconnect• Commoninterfacetoconnectall3computetiers
• Extensibledesigns• BSGManycore:fullyparameterizedRTLandAPRscripts
• ThirdPartyIP• ARMStandardCells,I/Ocells,RF/SRAMgenerators
![Page 34: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/34.jpg)
BaseJump:DesigningtheDNAforOpenSourceASICs
http://www.bjump.org
![Page 35: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/35.jpg)
In the Bespoke Silicon Group (BSG), We Think Building Hardware Is An Epic Sport
BaseJump On-ChipClock Generator
ASIC TapeoutsBSG-Loopback (180nm) – November 2016BSG-X (180nm) – December 2016Celerity (16nm) – April 2017
6-months
![Page 36: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/36.jpg)
Let’s Build the DNA for Open Source ASICs
• The components required to build a full system• RTL Design (NOCs, async crossers, arbiters, FIFOs, …)• IP Cores (High-speed IO, PLLs, CPU, …)• Hardware emulation• Socket (package and padring)• PCB motherboard
• Digital ASIC Systems share a lot of “DNA”• Many of these components are very common• Only minor modifications (if any) are needed• Every chip inherits many defaults from the last; gets easier and easier
• What if we could share a “base class” for ASICs across the world and extend to fit our system requirements?
![Page 37: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/37.jpg)
BaseJump: An Open Source “Base Class” For ASIC Designs• A collection of open source components that act
as a starting place for each step in building an end-to-end system
• Major Components• BaseJump STL• BaseJump Socket
• IO Padring• BGA Package
• BaseJump Motherboards• Emulation PCB, ASIC PCB
• BaseJump Rocket & RV-IOV Adapters• BaseJump Manycore
• The manycore you saw earlier• BaseJump FPGA Bridge http://bjump.org
Allthehardwareyouareabouttoseehascompletedesignsonourwebsiteandwe’lltellyouwhotosendittogetitfabricate.
![Page 38: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/38.jpg)
BaseJump STL
• Like C++ STL but for SystemVerilog• Many Essential IP blocks (stop building them over and over!)
• Basic building blocks (FIFOs, crossbars, arbiters)• Intuitive interfaces• Highly parameterizable
• More advanced IP blocks• Clock generators• High speed source synchronous I/O• SoC configuration interface (like SPI or JTAG)
• Verification and validated• Unit testing and regression testing suites• Many components are silicon proven
![Page 39: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/39.jpg)
BaseJump STL Components
popcount, flop trays, decoders, lfsr, multiplies, flexible muxes, transposers crossbars, gray_to_binary, priority encoder, thermometer encoders, counters
asynchronous fifos and interfaces
synthesizable digital clock generator
high-speed I/O source-synchronous interface
FIFOs, stream mergers, round-robin arbitrators, serial-to‐parallel converters
front side bus (high-speed bridge between off-chip and on‐chip worlds)
portability layer for SRAMs
mesosynchronous I/O library (high_speed + low latency)
network-on‐chip building blocks RISC‐V interface logic
SoC configuration interface (like SPI or JTAG)
Test bench blocks; reset generators, delay lines, clock gens
bsg_misc
bsg_asyncbsg_clk_genbsg_comm_linkbsg_dataflowbsg_fsbbsg_membsg_mesosyncbsg_nocbsg_tagbsg_test
Package Example IP Cores
SeveralHundredModules,AllParameterized
![Page 40: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/40.jpg)
BaseJump Socket:Standard I/O Interfaces for Your ASIC• Complete suite for speaking out of your
chip to an FPGA at high frequency• RTL Code for ASIC (bsg_comm_link)• RTL Code for FPGA “ ”• Standard I/O padring• BGA Package• BGA Socket
• Optimized for signal integrity and cost• DDR, source synchronous• Single-ended• Starting point for new padring designs• Can repurpose pins
• Switch pin directions• Replace with analog pins• Add more power domains
InspiredDARPACRAFTFlipchipSockets!
BGAPadring
BGASocket
BGASubstrate
![Page 41: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/41.jpg)
BaseJump: Motherboards• DoubleTrouble: pre-tapeout HW emulation• RealTrouble: post-tapeout ASIC bring-up• BaseJump includes open firmware for FPGA
• Can be used on DoubleTrouble and RealTrouble• Can connects to Xilinx dev boards over FMC
• Allows us to use Xilinx’s IP cores (DRAM, PCIe, …)PlugyourASICIntothissocketAndyouaredone!
![Page 42: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/42.jpg)
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
http://bjump.org
![Page 43: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/43.jpg)
BaseJump Socket:IO Padring
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
http://bjump.org
![Page 44: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/44.jpg)
BaseJump DoubleTrouble:HW Emulation Motherboard
BaseJump Socket:IO Padring
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
http://bjump.org
![Page 45: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/45.jpg)
BaseJump DoubleTrouble:HW Emulation Motherboard
BaseJump Socket:IO Padring
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
BaseJump:Open FPGA
Firmware
http://bjump.org
![Page 46: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/46.jpg)
BaseJump DoubleTrouble:HW Emulation Motherboard
BaseJump Socket:IO Padring
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
BaseJump:Open FPGA
Firmware
Tape out!
http://bjump.org
![Page 47: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/47.jpg)
BaseJump DoubleTrouble:HW Emulation Motherboard
BaseJump Socket:IO Padring
HD
L D
esig
n
BaseJump STL:Standard library of
hardware components
BaseJump:Open FPGA
Firmware
BaseJump Socket:BGA Package
http://bjump.org
![Page 48: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/48.jpg)
BaseJump DoubleTrouble:HW Emulation Motherboard
BaseJump Socket:IO Padring
BaseJump RealTrouble:Bring-up Motherboard
HD
L D
esig
n
http://bjump.org
BaseJump STL:Standard library of
hardware components
BaseJump Socket:BGA Package
BaseJump:Open FPGA
Firmware
![Page 49: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/49.jpg)
OutlineTheCelerityOpenSourceRISC-VTieredAcceleratorFabric:FastArchitecturalDesignMethodologiesforFastChips
BaseJump:DesigningtheDNAforOpenSourceASICs
BaseJump Manycore:“OpenSourceforGPU”
http://www.opencelerity.org
http://www.bjump.org
http://www.bjump.org/manycore
![Page 50: Wed0900 Celerity - An Open Source 511-core RISC-V Tiered ......•Operating system (e.g. Linux & TCP/IP Stack) •Interrupt and Exception handling •Program dispatch and control flow](https://reader034.fdocuments.us/reader034/viewer/2022042111/5e8bdb91a5fb271c07365839/html5/thumbnails/50.jpg)
ThanksRitchieZhao,ChunZhao,ShaolinXie,Bandhav Veluri,LuisVega, ChristopherTorng,
Ningxiao Sun,AustinRovinski,AnujRao,Gai Liu,PaulGao,ScottDavidson,SteveDai,Aporva Amarnath,KhalidAl-Hawaj,TutuAjayi
ChristopherBatten,RonaldG.Dreslinski,RajeshK.Gupta,MichaelB.Taylor,Zhiru Zhang