2016-01-05 VectorBlox ORCA RISC-V DEMO · 1/5/2016 · Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s <...

FPGA-‐Optimized ORCA

Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s < $5.00

ISA: RV32IM hw multiply, sw divider

< 2,000 LUTs ~ 20MHz

ORCA FPGA-‐Optimized

What is ORCA?

•  Family of RISC-V implementations – Highly parameterized –  Ideally suited for FPGAs – Portable across FPGA vendors – BSD license open source hardware

Why ORCA?

•  Many reasons – Orcas travel in pods: family of many sizes – Orcas are native to Vancouver

•  ORCA – many possible backronyms – ORCA RISC-V Computer Architecture – ORCA Reconfigurable CPU Architecture – Optimized RISC-V CPU Architecture

ORCA: Multiple FPGA Vendors

•  Altera –  Drop-in Qsys

replacement for Nios II/f

–  Avalon I / D masters

•  Lattice –  Wishbone I / D

masters

•  Xilinx, Microsemi –  Coming soon

Area 2008 LUT4 1623 LUT4 541 ALMs

Fmax 22 MHz 109 MHz 244 MHz

DMIPS n/a 79 MIPS 212 MIPS

DMIPS/MHz n/a 0.73 0.87

ORCA RISC-V RV32I on Different FPGAs

ORCA vs Other RISC-V

ORCA RV32IM

Z-scale RV32IM

PicoRV RV32I

Area 2353 LUT4 (Cyclone IV,

2678 LUT4 (Spartan 6,

2949 LUT4 (Cyclone IV,

Fmax 125 MHz 33 MHz 127 MHz

DMIPS 122 MIPS 44 MIPS 39 MIPS

DMIPS/MHz 0.98 (measured)

1.35 (claimed)

0.31 (claimed)

ORCA RISC-V vs FPGA CPUs

ORCA RV32IM

Altera Nios II/f

Area 2353 LUT4 (Cyclone IV)

2678 LUT4 (Cyclone IV)

Fmax 125 MHz 140 MHz

DMIPS 122 MIPS 163 MIPS

DMIPS/MHz 0.98 (measured)

1.16 (claimed)

RISC-V: Architecture Space •  Width (3 choices)

–  32, 64, 128 bits

•  Instruction Set (9 binary options, 2^9 choices) –  Minimum: I –  Binary options: M, A, F, D (== G), Q, L, B, T, P

•  Instruction Encoding (2 choices) –  C

•  Architecture Space 3 x 2^10 = 3072 possibilities

ORCA Implementation Space •  Logic design (12 choices)

–  Multiplier (sw, hw) –  Divider (sw, hw) –  Shifter (1-cycle, 8-cycles,

32-cycles) •  Counters (3 choices)

–  0, 32, or 64 bits

•  Pipelining (2 choices) –  4 or 5 stages

•  Forwarding (2 choices) –  ALU only –  ALU + other units

•  Implementation space 12 x 3 x 2 x 2 = 144 possibilities •  Overall arch. x impl. = 3072 x 144 / 2 = 221,184 possibilities

Huge Design Space •  Implementation on ASIC

–  Need to choose one design point in the architecture + implementation space

–  Benefit: user has no choice –  Problem: compromise across many applications

•  Implementation on FPGA –  Can have fully parameterized design –  User can choose best architecture + implementation according

to application –  Benefit: good performance, area –  Problem: overwhelming design space

32b vs 64b Counters

Faster

Slower

Smaller LUT6 count Bigger

Execution Time

64b counters

no counters

32b counters

4 vs 5 Pipeline Stages

4 pipeline stages

5 pipeline stages

Faster

Slower

Smaller LUT4 count Bigger

Execution Time

FPGA è ASIC but

ASIC !è FPGA

good FPGA implementation è often leads to good ASIC implementation

good ASIC implementation

è often leads to poor FPGA implementation

Register File

•  Discrete FFs: inefficient

– 32 cpu registers x 32 b = 1024 FFs – 32 mux32 = 32 x 11 LUT4 = 352 LUT4s

•  Note: muxes are costly, must avoid!!! 15

x31 x0x1x2

XLEN-1

Cost of muxes (1b wide): mux4: 2 LUT4 or 1 LUT6 mux16: 10 LUT4 or 5 LUT6 mux32: 11 LUT4 or 5.5 LUT6

Register File Implications •  Block RAMs: dual ported, registered output

–  Use 1 RD, 1 WR port –  Use data-out FFs as pipeline FFs –  Needs external data-forwarding

rd-data1

wr-data1

rd-data2

wr-data2

Port 1

Port 2

(UNUSED)

(OPERAND A or B)

(RS or RT)

(WB DATA)

(UNUSED)

RAM cannot forward WB data to OPERAND data internally

rd-data1

wr-data1

rd-data2

wr-data2

Port 1

Port 2

(UNUSED)

(OPERAND A or B)

(RS or RT)

(WB DATA)

(UNUSED)

ORCA Datapath

ALU / BR / SLT

Data Avalon /Wishbone

Decode Ex/Mem WB

Forward(can be collapsed

into Ex/Mem stage)

Instr. Avalon /Wishbone

InstrFetch

Some FPGA Suggestions

•  RV32E spec –  Reduced # registers saves nothing in FPGAs –  Divide is expensive

•  Software –  Beware, shifts may be 1b/cycle (slow)

•  Privileged Arch spec –  Too many CSRs, 64b counters too big

•  Increases pressure on multiplexers

–  Suggest small / med / full versions •  No “official” rules on what to include/exclude to reduce size •  Eg, hypervisors not likely to run on FPGAs

Conclusions

•  ORCA RISC-V family is free, portable, FPGA-optimized

– FPGA and ASIC optimizations are different •  FPGA architecture dictates certain design choices

– Some RISC-V decisions are “unconsciously” aimed towards ASIC implementation

•  These do not lead to good FPGA implementations •  But good FPGA choices lead to good ASICs

Free FPGA Hardware! •  Today only: Lattice donating

FPGA boards for RISC-V users

•  ORCA RV32I system ~2000 LUTs http://www.github.com/VectorBlox/risc-v About 1500 LUTs available for user I/O (eg, UART) 20

LUNCHTIME So Long, and Thanks for All the Fish !!

2016-01-05 VectorBlox ORCA RISC-V DEMO · 1/5/2016 · Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s <...

Documents

Transcript of 2016-01-05 VectorBlox ORCA RISC-V DEMO · 1/5/2016 · Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s <...

VectorBlox Video Kit Demo Guide V1Guide.pdf · VectorBlox Video Kit Demo Guide V1.0 Purpose This demo is for CoreVectorBlox neural network acceleration on the PolarFire ® field-programmable

50/35/20MHz Analog Oscilloscope - TME...GOS-635G (35MHz), GOS-622G (20MHz). The 20MHz GOS-620 oscilloscope is an economic analog oscilloscope with 5mV/div ~ 5V/div vertical sensitivity

Aquatec®ORCA ORCAF ORCA XL - Invacare · 2020-06-18 · Aquatec® ORCA / ORCA F / ORCA XL 1 General 1.1 Introduction Thisusermanualcontainsimportantinformationaboutthe handlingoftheproduct.

Orca Dp Spice

Orca Manual

ADC1175 8-Bit,20MHz, 60mW A/D Converter - TI.com · ADC1175 SNAS012H – JANUARY 2000– REVISED APRIL 2013 ADC1175 8-Bit,20MHz, 60mW A/D Converter Check for Samples: …

Orcinus Orca

Orca product info

ORCA OPERATION MANUAL LOX - Chart Industriesfiles.chartindustries.com/11656751_Orca_HL-Series_LOX...ORCA OPERATION MANUAL LOX ORCA MICRO-BULK DELIVERY SYSTEM CHART INC. DISTRIBUTION

JOMEC Journal - -ORCA

SUPPLEMENTARY DATA - -ORCA

Gould Os1420 2x2mv,20mhz Oscilloscope 1982 Sm

Orca Research Trust...orca amongst group of dolphins, no reactions visible. dolphins were observed to bow ride in front of orca (Figure 5). when orca arrived dolphins initially scattered,

Tait Orca Trunked Programming Applicationmanuals.repeater-builder.com/2005/ORCA/ORCA TRUNKED...and features of Tait Orca portable radios using a standard IBM-compatible PC. This manual

Orca Eu 2008

26 September 2019 … · Band + 20MHz Intra-Band Non-Contiguous CA: +/-260ns 100MHz to the UE UE Frequency Band B D 20MHz Contiguous Blocks 20MHz Frequency Band A D Inter-Band Contiguous

2016-01-05 VectorBlox ORCA RISC-V DEMO€¦ · Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s < $5.00 ISA: RV32IM hw multiply, sw divider < 2,000 LUTs ~ 20MHz ORCA FPGAOptimized..

ORCA - The Catalogue

[Papercraft] Orca

Pacific Orca Distribution Survey (PODS) conducted aboard ... · Pacific Orca Distribution Survey (PODS) ... Orca Distribution Survey (PODS), conducted aboard ... during the Pacific