NIOS II Processor
-
Upload
vinchipsytm-vlsitraining -
Category
Documents
-
view
238 -
download
1
Transcript of NIOS II Processor
-
7/28/2019 NIOS II Processor
1/28
-
7/28/2019 NIOS II Processor
2/28
Outline
What is a Soft Processor
What is the NIOS II?
Architecture for NIOS II, what are the
implications TigerSHARC VS. NIOS II
Pipeline Issues
Issues related to FIR
Hardware acceleration, using FPGAlogic
-
7/28/2019 NIOS II Processor
3/28
Whats is a Soft
Processor? Processor implemented in VHDL, Verilog,
etc., and downloaded onto FPGA hardware
Can implement many parallel processors
on one FPGA Can use addition FPGA resources on the
same chip that is not part of the processor
core.
NIOS II is a Soft Processor
-
7/28/2019 NIOS II Processor
4/28
Why Soft Processor?
Higher level of design reuse
Reduced obsolescence risk
Simplified design update or change
Increased design implementation
options
Lower latency between processor and
FPGA components
-
7/28/2019 NIOS II Processor
5/28
What is NIOS II?
Software-defined processor
The processor core is loaded onto
FPGA
Programmed using normal
programming tools (C, asm), not
hardware description languages
Can use the rest of the FPGA hardwarefor accelerating parts of the code
-
7/28/2019 NIOS II Processor
6/28
How Is NIOS II
Implemented The custom FPGA logic that interacts
with the processor is implemented in
Altera Quartus II
The Avalon Interface bus (commoninstruction/data bus) is implemented in
Quartus II
The architecture is generated in QuartusII and used for programming in Eclipse
IDE
-
7/28/2019 NIOS II Processor
7/28
-
7/28/2019 NIOS II Processor
8/28
NIOS II IDE
Coding is implemented in Eclipse rather than
VisualDSP.
-
7/28/2019 NIOS II Processor
9/28
The Different NIOS II Cores
There are 3 cores available from Altera
NIOSII/e: Economical Core
NIOSII/s: Standard Core
NIOSII/f: Fast Core
-
7/28/2019 NIOS II Processor
10/28
Whats the Difference between
the Cores?
An LE is equivalent to a 8-1 NAND gate + 1 D-Flip FlopAn ALM is equivalent to 2 LEs
-
7/28/2019 NIOS II Processor
11/28
Comparison of TigerSHARC and
NIOS II architecture
-
7/28/2019 NIOS II Processor
12/28
TigerSHARC Architecture
-
7/28/2019 NIOS II Processor
13/28
NIOS II Architecture
-thirty two 32-bit general registers, six 32-bit control registers
-variable cache based on how much FPGA space you have
-ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is
not separate like TigerSHARC
-
7/28/2019 NIOS II Processor
14/28
Avalon Interface
-separate address, data and control lines
-up to 1024-bit data width transfer, can be set to any width (not power of 2)
-one transfer per clock cycle.
-
7/28/2019 NIOS II Processor
15/28
NIOS II/f pipeline
Six stages
One instruction can be dispatched and/or
retired pre cycle
Dynamic branch prediction: 2-bit branchhistory table (no BTB like in TigerSHARC)
-
7/28/2019 NIOS II Processor
16/28
NIOS II/f pipeline
The pipeline stalls for:
Multi-cycle instructions
Cache misses
Data dependencies (2 cycles between
calculating and using result)
Mispredicted branch penalty: 3 cycles
-
7/28/2019 NIOS II Processor
17/28
-
7/28/2019 NIOS II Processor
18/28
Hardware multiply
Can use different options for multiplier(at the processor design stage) No h/w multiply (saves FPGA gates)
Speed depends on algorithm Use embedded multipliers (if FPGA has
those)
1-5 cycles (depends on FPGA)
Implement multipliers on FPGA gates 11 cycles
Division 4-66 cycles on hardware
-
7/28/2019 NIOS II Processor
19/28
Compare to TigerSHARC
No support for parallel instructions
No support for SIMD operations
Multicycle instructions stall the pipeline
All the above limitations can be overcome
by using FPGA space unoccupied by the
processor itself
-
7/28/2019 NIOS II Processor
20/28
Comparison of NIOS II and
TigerSHARC on an FIR Algorithm
-
7/28/2019 NIOS II Processor
21/28
Integer FIR algorithm
int coeff[]={1, 2, 3, 4, 5, 6, 7, 8};
int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0};
int output[8];
int i=0, j=0, k=0;
for(k=0; k
-
7/28/2019 NIOS II Processor
22/28
Speed analysis
0 movi r4,8 i = 8
1 Loop: ldw r2,0(r6) load data
2 ldw r3,0(r7) load coefficient
3 addi r4,r4,-1 i--
4 addi r6,r6,4 coeffPt++
5 mul r2,r2,r3 data = data * coeff
6 addi r7,r7,-4 dataPt--
7 stall data stallwaiting for multiplication
result
8 add r5,r5,r2 output += data
9 bner4,zero,0x10002a0
will mispredict 2 times in the
beginning, and 1 time in the end of
the loop (waste 3 cycles each time)
-
7/28/2019 NIOS II Processor
23/28
Speed analysis
9 cycles per iteration except the first two(branch predicted not taken) and the last(branch predicted taken) those will be9+3=12 cycles
1 data stall can remove by movinginstruction from line 4 to 7
Speed: 8 cycles * (N-3) + 11 cycles * 3 =
8*(N-3)+33 cycles
For 1024-tap FIR: 8201 cycles
Clock cycle is 3 times longer (200MHz vs600MHz)
-
7/28/2019 NIOS II Processor
24/28
Speed comparison
8201 NIOS II cycles equivalent to 24603TigerSHARC cycles
Lab3 timing:
56000 cycles Debug mode 13000 unoptimized ASM
4000 Optimized ASM
Worse than unoptimized assembly, but nohardware acceleration used, so this is notthat bad
-
7/28/2019 NIOS II Processor
25/28
Hardware Acceleration
Profiling tool in Eclipse can show how
long each function takes
If function takes too long, it can be sped
up by Custom instructions
Hardware Acceleration
Hardware Acceleration is to take thefunction and transform it into FPGA
circuitry
-
7/28/2019 NIOS II Processor
26/28
Hardware Acceleration
Can be done using C2H compiler from Altera
Trades off Logic Size for Speed up.Table 1. User Appl ication Resul ts Example
Algorithm Speed Increase(vs. Nios II CPU)
System fMAX(Mhz)
System ResourceIncrease (1)
Autocorrelation 41.0x 115 124%
Bit Allocation 42.3x 110 152%
Convolution Encoder 13.3x 95 133%
Fast Fourier Transform(FFT)
15.0x 85 208%
High Pass Filter 42.9x 110 181%
Matrix Rotate 73.6x 95 106%
RGB to CMYK 41.5x 120 84%
RGB to YIQ 39.9x 110 158%
http://www.altera.com/products/ip/processors/nios2/tools/c2h/ni2-c2h.htmlhttp://www.altera.com/products/ip/processors/nios2/tools/c2h/ni2-c2h.html -
7/28/2019 NIOS II Processor
27/28
Conclusion
Soft Processors such as the NIOSII
offers another alternative in the
embedded system scene.
The NIOSII offers the advantage ofadded configurability, and customization
that blur the line between FPGAs and
DSPs
-
7/28/2019 NIOS II Processor
28/28
References
[1] http://www.fpgajournal.com/articles/behere.htm
Describes an FPGA-DSP project based on Altera Nios
[2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html
Official Nios II page
[3] http://www.hunteng.co.uk/dsp-fpga.htm
DSP or FPGA? What is better when?
[4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf
Article from Xilinx about FPGA DSPs
[5] http://www.niosforum.com
Community forum for NIOS
[6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf
NIOSII Processor HandbookAltera Corporation
[7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdfAvalon Memory-Mapped Interface Specifications Altera Corporation
[8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html
ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded
DRAM
http://www.fpgajournal.com/articles/behere.htmhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdfhttp://www.niosforum.com/http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdfhttp://www.altera.com/literature/manual/mnl_avalon_spec.pdfhttp://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.htmlhttp://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.htmlhttp://www.altera.com/literature/manual/mnl_avalon_spec.pdfhttp://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdfhttp://www.niosforum.com/http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdfhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.hunteng.co.uk/dsp-fpga.htmhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.altera.com/products/ip/processors/nios2/ni2-index.htmlhttp://www.fpgajournal.com/articles/behere.htm