CAVA: Open Source Infrastructure for System-On-a-Chip Designs
description
Transcript of CAVA: Open Source Infrastructure for System-On-a-Chip Designs
CAVA Open Source Infrastructure CAVA Open Source Infrastructure for System-On-a-Chip Designsfor System-On-a-Chip Designs
Peter Hsu PhDPeter Hsu PhD
Peter Hsu Consulting Inc43551 Mission Blvd Fremont CA 94539
email peterhsucswiscedu
Presented 14 March 2002 at the University of Wisconsin in Madison
CAVA Open Source Infrastructure for System-On-a-Chip Designs
2
Design InfrastructureDesign Infrastructure
Hardware ldquoIntellectual Propertiesrdquondash Processor Core(s)ndash Multiprocessor Bus External Interfaces
Software Development Environmentndash Gnu Compiler gcc as ld Utilities glibc gdb hellipndash Linux Operating System Kernel Drivers
Analysis Toolsndash Performance Simulator(s)ndash Architecture Verification Programs
CAVA Open Source Infrastructure for System-On-a-Chip Designs
3
Multiprocessor OrganizationMultiprocessor Organization
MemoryController
CPU L1$
IO interface
CPU L1$
IO interface
CPU L1$
IO interface
CPUL1$
ComplexIO interface
NewInsns
CPU L1$
IO interface
ApplicationSpecific
Logic
CAVA Open Source Infrastructure for System-On-a-Chip Designs
4
Cost-Performance EvolutionCost-Performance Evolution
60mm2
30
15
018microm500 wafer70 yield
$10
013microm1000 wafer
85 yield$4
009microm90 yield
$2 302-4 CPUs
8MB DRAM$5
1 GHz
800 MHz
600 MHz
400 MHz
+30
CAVA Open Source Infrastructure for System-On-a-Chip Designs
5
Authorrsquos PerspectiveAuthorrsquos Perspective
Maybe Boringhellipndash ldquoDecent Computer Using Crummy Technologyrdquo
bull Example Seymore Crayrsquos vs IBMrsquos Approach
ndash ldquoGet More out of Existing Technologyrdquobull Hybrid PetroElectric Car
Cost Drives Application Breadthndash ldquo900 Mips Make A Better Light Bulbrdquondash ldquoEaring PDA Helps You Remember Datesrdquondash ldquoSilicon Sequin Dress Adapts to Weatherrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
2
Design InfrastructureDesign Infrastructure
Hardware ldquoIntellectual Propertiesrdquondash Processor Core(s)ndash Multiprocessor Bus External Interfaces
Software Development Environmentndash Gnu Compiler gcc as ld Utilities glibc gdb hellipndash Linux Operating System Kernel Drivers
Analysis Toolsndash Performance Simulator(s)ndash Architecture Verification Programs
CAVA Open Source Infrastructure for System-On-a-Chip Designs
3
Multiprocessor OrganizationMultiprocessor Organization
MemoryController
CPU L1$
IO interface
CPU L1$
IO interface
CPU L1$
IO interface
CPUL1$
ComplexIO interface
NewInsns
CPU L1$
IO interface
ApplicationSpecific
Logic
CAVA Open Source Infrastructure for System-On-a-Chip Designs
4
Cost-Performance EvolutionCost-Performance Evolution
60mm2
30
15
018microm500 wafer70 yield
$10
013microm1000 wafer
85 yield$4
009microm90 yield
$2 302-4 CPUs
8MB DRAM$5
1 GHz
800 MHz
600 MHz
400 MHz
+30
CAVA Open Source Infrastructure for System-On-a-Chip Designs
5
Authorrsquos PerspectiveAuthorrsquos Perspective
Maybe Boringhellipndash ldquoDecent Computer Using Crummy Technologyrdquo
bull Example Seymore Crayrsquos vs IBMrsquos Approach
ndash ldquoGet More out of Existing Technologyrdquobull Hybrid PetroElectric Car
Cost Drives Application Breadthndash ldquo900 Mips Make A Better Light Bulbrdquondash ldquoEaring PDA Helps You Remember Datesrdquondash ldquoSilicon Sequin Dress Adapts to Weatherrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
3
Multiprocessor OrganizationMultiprocessor Organization
MemoryController
CPU L1$
IO interface
CPU L1$
IO interface
CPU L1$
IO interface
CPUL1$
ComplexIO interface
NewInsns
CPU L1$
IO interface
ApplicationSpecific
Logic
CAVA Open Source Infrastructure for System-On-a-Chip Designs
4
Cost-Performance EvolutionCost-Performance Evolution
60mm2
30
15
018microm500 wafer70 yield
$10
013microm1000 wafer
85 yield$4
009microm90 yield
$2 302-4 CPUs
8MB DRAM$5
1 GHz
800 MHz
600 MHz
400 MHz
+30
CAVA Open Source Infrastructure for System-On-a-Chip Designs
5
Authorrsquos PerspectiveAuthorrsquos Perspective
Maybe Boringhellipndash ldquoDecent Computer Using Crummy Technologyrdquo
bull Example Seymore Crayrsquos vs IBMrsquos Approach
ndash ldquoGet More out of Existing Technologyrdquobull Hybrid PetroElectric Car
Cost Drives Application Breadthndash ldquo900 Mips Make A Better Light Bulbrdquondash ldquoEaring PDA Helps You Remember Datesrdquondash ldquoSilicon Sequin Dress Adapts to Weatherrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
4
Cost-Performance EvolutionCost-Performance Evolution
60mm2
30
15
018microm500 wafer70 yield
$10
013microm1000 wafer
85 yield$4
009microm90 yield
$2 302-4 CPUs
8MB DRAM$5
1 GHz
800 MHz
600 MHz
400 MHz
+30
CAVA Open Source Infrastructure for System-On-a-Chip Designs
5
Authorrsquos PerspectiveAuthorrsquos Perspective
Maybe Boringhellipndash ldquoDecent Computer Using Crummy Technologyrdquo
bull Example Seymore Crayrsquos vs IBMrsquos Approach
ndash ldquoGet More out of Existing Technologyrdquobull Hybrid PetroElectric Car
Cost Drives Application Breadthndash ldquo900 Mips Make A Better Light Bulbrdquondash ldquoEaring PDA Helps You Remember Datesrdquondash ldquoSilicon Sequin Dress Adapts to Weatherrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
5
Authorrsquos PerspectiveAuthorrsquos Perspective
Maybe Boringhellipndash ldquoDecent Computer Using Crummy Technologyrdquo
bull Example Seymore Crayrsquos vs IBMrsquos Approach
ndash ldquoGet More out of Existing Technologyrdquobull Hybrid PetroElectric Car
Cost Drives Application Breadthndash ldquo900 Mips Make A Better Light Bulbrdquondash ldquoEaring PDA Helps You Remember Datesrdquondash ldquoSilicon Sequin Dress Adapts to Weatherrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
6
Efficiency Matters F Efficiency Matters F C C V V22
Frequency (F)ndash Fewer Instructions Lower Latencies
Capacitance (C)ndash Fewer Gates Narrower Bus hellip
Voltage (V)ndash Slower Gates Less Logic per Cycle
Same Designndash 600 MHz 12 V 4 W frac12 W 250 MHz 06 V
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
7
Why Open SourceWhy Open Source
Embedded Marketndash High Volume Low Prices Little Profit Small
Engineering Budget Few Inventionshellipndash Most SoC Use 1970rsquos Architecture
bull Motorola 6800 x86 Family eg z80
ndash Royalty Payment Based Innovationbull ARM MIPS Tensilica hellip Future Unclear
Industry Ripe for New Market Dynamicsndash Cava Set New Cost-Performance Standardndash No Royalties New Players Opportunities
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
8
Key DecisionsKey Decisions
ldquoRTLrdquo Designndash Integration Effort Process Migration Accessibility
bull More Important Than Pure Performance
Logic Synthesis Standard Cells PampRbull No Dynamic Circuits No Custom Layout
New ISAndash Companies ldquoOpenrdquo Architectures Feel ldquoUnsaferdquo
bull Originator Company Big and Mean
ndash Neutrality ldquoPerception is Realityrdquobull Ex Linux Very x86-centric but Perceived Otherwise
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
9
Why ASIC CPU Usually SlowWhy ASIC CPU Usually Slow
Design Issuendash C-MOS vs Dynamic Circuits
bull High Fan-In Gate Cache Hit Detection TLBbull ASIC SRAM Very Fast (ldquoHard Macrordquo)
ndash Fruitful Area for Innovationshellip
Manufacturing Issuendash Make Few Wafers Worse-Case Designndash Make Many Wafers Typically 50 Faster
bull 2GHz Pentium 4 ISSCC Chips
ndash SoC ldquoMake Cheaper Goes Fasterrdquo
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
10
Ex Tag Match Critical PathEx Tag Match Critical Path
Dynamic vs Staticndash Adder 20 30ndash Tag 15 25ndash Data 20 25ndash Match 5 20
2-Cycle Loadndash D (20+15+5)2 = 20ndash S (30+25+20)2 = 38ndash 800 vs 400 MHz
bull Inv Delay = 63ps
Adder
DataTag
=
DataTag
=
30
25
20
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
11
Recurrance Forward SubstitutionRecurrance Forward Substitution
Replicate ALUndash (30+25)2 = 275ndash 600 MHz
bull vs 800 400 MHz
ndash 5K gates 01mm2
Opportunityndash Most CPU Knowledge
from Custom Designsndash Experiment Using Logic
Synthesis PlaceampRoute
Adder
DataTag
= Adder
DataTag
=3020
25 25
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
12
Instruction SetInstruction Set
RISC Lessons Many Registers
Everything Else
Code Densityndash Silicon CPU 3
MBytes of DRAMndash Applications Millions
of Lines of Code
24-bit Instructionsndash Two 64-reg Specifiers
32-bit address x -op
x -opimm
x opydisp
x -yop24-bit offset
6 468
x = x op y
x = x op imm
x = (y+disp)
x -yop
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
13
Variable Length InstructionsVariable Length Instructions
Superscalar Paradigmndash Speculatively Decode Ignore Some
bull Middle vs Always Instructions At End
ndash Idiom Acceleration bull Set high 16bit + load with 16bit displacement
Key Featuresndash Lengths In Multiples (Semour Crayrsquos Parcel)
bull Else Many Wasted Decode Stations
ndash Opcode Register Fields in 1st Parcelbull Single-Issue Design Conserve Gates Power
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
14
Why Multiprocessor ArchitectureWhy Multiprocessor Architecture
Efficiencyndash Area
bull More Mips per Gate
ndash Powerbull Less Synchronousbull Effective Clock Gating
Ease of Usendash Real Time
bull vs Fancy Scheduling
ndash Customization
SuperscalarUniprocessor
MemCtrl
Eth1
Eth2PCI 1PCI 2
USB
CPU +Thin IO
CPU +Thin IO
CPU+ Thin IO
MemCtrl
ConventionalIntegration
CAVA-1
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
15
IO Architecture EvolutionIO Architecture Evolution
Typical IO Controller Chipndash Mixed AnalogDigital Physical Media Interfacendash Protocol Encapsulation Error Handlingndash Flow Control Interrupts DMA Buffering
Market Forcesndash Complex Controller Chips are Cost Effective
bull Economy of Scale Outweights Gate Utilization
ndash Integrate Existing Chips into SoCbull Minimize Engineering Less ldquoRiskrdquo
Is ldquoHolisticrdquo Solution Better
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
16
CAVA IO ArchitectureCAVA IO Architecture
Processor Doubles As IO Controllerndash ldquoBackgroundrdquo Context
bull 64 Registers Overlapped Execution Speculative
ndash ldquoForegroundrdquo Contextbull 4 Registers De-pipelined ( 1 Instruction 4 Clocks)bull Branch on IO Register Bit (eg Error Data Readyhellip)
ndash Dedicated Gates Only Physical Media Interface
Open Questionsndash Efficiency Dedicated Logic Interferencendash Need Good Abstraction
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
17
CAVA-1 Memory SystemCAVA-1 Memory System
Initial Targetndash No L2 Cache
bull If 6-T Memory Cell Bigger L1 Same Cycle Time
ndash Not Rambusbull Royalty Payment RAC Process Migration Difficulties
DRAMndash FCRAM ldquoSlightly Better DRAMrdquo
bull Cache Miss Penalty 50ns = 30 Cycles 600 MHzbull 64-bit Interface 24GBs 1 ByteCycleCPU
ndash Flexibility Very Importantbull Also Use Standard DDR SDRAMbull 64- 32- and 16-bit wide (4 2 or 1 chips)
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
18
Multi-Thread SupportMulti-Thread Support
Memory Latency Impactndash 5 miss 30 cycles = 15 cpi Half Idlendash Significant Improvement Lots of Silicon Area
2-Port SRAM as Register Filendash 64 64 256 64 +50 Areandash CAVA-1 3 Background 16 Foreground
Interesting Questionsndash ldquoHelper Threadrdquo for Serial Programndash Quantify Speculation Area Clock Speed Perf
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
19
Project StatusRoadmapProject StatusRoadmap
Phase 1 Running (Limping -)
ndash gcc as ld glibc gdb ISA emulator
Phase 2 Fall 2002ndash Linux Kernel IO Controller Programs
bull Standalone PC Emulator FPGA PCI Card for IObull Key Decisions re Verification Strategy
ndash Quantitative Analysis of Clock Cyclebull Target 600 MHz 018microm (typ process wc env)
Phase 3 Spring 2003ndash Design RTL Tapeout Party Debug hellip
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
20
ISA Research ToolISA Research Tool
Instruction Descriptionsndash Syntaxin
bull Lengths Field Definitions Opcode EncodingsiR imm[8] [6] x[6] [4=0] Immediate with Register
bR [8] y[6] x[6] [4=F] Binary Register Operator
ndash Semanticsinbull Gcc ldquomdrdquo Patterns Emulator C Statement
iRia addi x += imm Add Immediate
(set (match_operandDI ldquoregister_operandrdquo ldquo=rrdquo)
(plusDI (match_operandDI ldquoregister_operandrdquo ldquo0rdquo)
(match_operandDI ldquoimmediate_operandrdquo ldquoIrdquo))
Consistent gcc as ld gdb emulator doc
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch
CAVA Open Source Infrastructure for System-On-a-Chip Designs
21
ConclusionConclusion
New Technologyndash Creation Only Beginningndash Adoption Critical Mass Understand Can Modify
bull Necessary to Evangelize to Share
Visionndash Modular Architectural Enhancements by Manyndash Invent Novel SoCrsquos Uninhibited by Lawyers
Itrsquos Funndash Periodic Open Releases of Compiler RTLndash Participate Build Computer System from Scratch