1 ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic.
-
Upload
philomena-johns -
Category
Documents
-
view
222 -
download
0
Transcript of 1 ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic.
1
ALTERA FPGAs and NIOSII
ELG6158 Computer Systems Architecture
Miodrag Bolic
2
Presentation Outline
• Basic description of Stratix Altera Devices• NIOS II processor architecture• How to design a system using NIOS II processor
3
Stratix EP1S10 [2]
4
5
6
TriMatrix™ Memory [1]
M512 Blocks M4K Blocks M-RAMDedicated External Memory Interface
Look-Up Schemes Packet & Cell Buffering Cache
More Bits For Larger Memory Buffering
More Data Ports for Greater Memory Bandwidth
Small FIFOs Shift Register Rake Receiver
Correlator FIR Filter Delay Line
Header / Cell Storage Channelized
Functions ATM cell–packet
processing Nios Program Memory
Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data
Storage
512 bits per block + parity
4 Kbits per block + parity
512 Kbits per block + parity
7
Memory Bandwidth SummaryStratix Device Family [1]
Device Total RAM Bits
M-RAM Blocks
M4K Blocks M512 Blocks MaximumBandwidth
(Mbps)
EP1S10 920,448 1 60 94 1,245,024
EP1S20 1,669,248 2 82 194 2,096,928
EP1S25 1,944,576 2 138 224 2,894,400
EP1S30 3,317,184 4 171 295 3,750,192
EP1S40 3,423,744 4 183 384 4,384,800
EP1S60 5,215,104 6 292 574 6,762,528
EP1S80 7,427,520 9 364 767 8,784,720
8
9
Logic Array Blocks (LAB) [2]
• 10 LEs• Local Interconnect• LAB-Wide Control Signals
LE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
LE10
LE9
4
4
4
4
4
4
4
4
4
4
Control Signals
Lo
cal I
nte
rco
nn
ect
10
LAB Arrangement
• LABs Communicate Directly to Each Other & Other Blocks Both Horizontally & Vertically
LA
B
LA
B
LA
B
LA
B
LA
B
LA
B
LA
B
LA
B
LA
B
LA
B
M51
2
LA
B
LA
B
M51
2
LAB Row
LAB Column
11
Logic Elements
• Smallest Units of Logic• Used for Combinatorial/Registered Logic
Stratix™ LE
Carry-Out
Carry-In Register ChainInput
General Routing &
Local Routing
LUT ChainInput
LUT ChainOutput
Register ChainOutput
12
Total LE Resources
Device Total LEs
EP1S10 10,570
EP1S20 18,460
EP1S25 25,660
EP1S30 32,470
EP1S40 41,250
EP1S60 57,120
EP1S80 79,040
13
LE Datasheet Image
14
LE Features• 4-Input Look-Up Table (LUT)• Configurable Register• 2 Operation Modes• Dynamic Add/Subtract Control• Carry-Select Chain Logic• Performance-Enhancing Features
– LUT & Register Chain
• Area-Enhancing Features– Register Packing & Feedback
15
LE Inputs/Outputs• Inputs
– 4 Data– 2 LE Carry-Ins & 1 Lab Carry-In– 1 Dynamic Addition/Subtraction Control– Register Controls
• Outputs– 2 LE Carry-Outs– 2 Row/Column/DirectLink Outputs– 1 Local Output– 1 LUT Chain & 1 Register Chain
16
Operation Modes
• Normal– General Combinatorial or Registered Logic
• Dynamic Arithmetic– Used for
• Adders• Counters• Accumulators• Comparators
– Uses Carry Chain for Faster Operation
• Chosen Automatically by Quartus® II & NativeLink® Synthesis Tools– Based on Design & Design Constraints
17
LE Register Controls
• Clock/Clock Enable• Synchronous & Asynchronous Clear• Synchronous & Asynchronous Load & Data• Asynchronous Preset
– Preset Function Loads a ‘1
ALD/PRE
ADATA
D Q
ENACLRN
18
Normal Mode
Sync Load & Clear Logic
DDATA
4-Input LUT
Register Control Signals
Register Chain Input
Register Chain Output
LUT Chain Output
data1
data2
data3
data4
cin
Row, Column & DirectLink
Routing
Local Routing
Note:1) Functional Diagram Only. Please See Datasheet for more Details.
2) Addnsum & data1 connected via XOR logic
LUT Chain Input
Register Feedback
addnsub
(2)
19
Combinatorial Logic Only
Sync Load & Clear Logic
DDATA
4-Input LUT
Register Control Signals
Register Chain Input
Register Chain Output
LUT Chain Output
data1
data2
data3
data4
cin
Row, Column & DirectLink
Routing
Local Routing
Note:1) Functional Diagram Only. Please See Datasheet for more Details.
2) Addnsum & data1 connected via XOR logic
LUT Chain Input
Register Feedback
addnsub
(2)
20
Sequential Logic Only
Sync Load & Clear Logic
DDATA
4-Input LUT
Register Control Signals
Register Chain Input
Register Chain Output
LUT Chain Output
data1
data2
data3
data4
cin
Row, Column & DirectLink
Routing
Local Routing
Note:1) Functional Diagram Only. Please See Datasheet for more Details.
2) Addnsum & data1 connected via XOR logic
LUT Chain Input
Register Feedback
addnsub
(2)
21
Dynamic Arithmetic Mode
Sync Load & Clear Logic
DDATA
Register Control Signals
Register Chain Input
Register Chain Output
data1
data2
addnsub
Row, Column & DirectLink
Routing
Local Routing
Note: Functional Diagram Only. Please See Datasheet for more Details.
Carry-Out Logic
Carry-In Logic
LAB Carry-In
Carry-In0Carry-In1
Sum Calculator
Carry Calculator
data3
Carry-In0Carry-In1
Carry-Out1
Carry-Out0
22
Carry-Select Logic
• Each Cell Pre-Calculates Sum & Carry-Out for Carry = 1 & Carry = 0
• Carry-In Selects which Pre-Calculation Is Used
01
A0+B0+1 A0+B0+0
COUT1
SUMOUT
COUT0
CIN
COUT
Single LUT
23
Carry Chain Details
• Carry Chains Begin & End in Any LE
• 2 Carry Chains Can Exist In Any LAB
• Carry-Select Generated in LEs 5 & 10– Every LE Not in Critical
Timing Path
LAB Carry-Out
LE1
LE2
LE3
LE4
Sum1
Sum2
Sum3
Sum4
A1B1
A2
B2
A3B3
A4
B4 LE4
LE2
LE3
LE1
0 1LAB Carry-In
LE3
LE5Sum5A5
B5
LE6
LE7
LE8
0 1
LE9
LE10
Sum6
Sum7
Sum8
Sum9
Sum10
A6B6
A7B7
A8B8
A9B9
A10B10
24
LE1
LUT & Register Chains
• LUT Chain– Output of LUT Connects Directly
to LUT Below– Available Only In Normal Mode– Ex. Wide Fan-In Functions
• Register Chain– Output of Register Connects
Directly to Register Below (Shift Register)
– LUT Can Be Used for Unrelated Function
– Ex. LE Shift Register
• Both Chains End at LAB Boundary
LUT D Q
LE2D Q
LEs 3 - 10
LUT Chain
Register Chain
LUT
25
Stratix Interconnects
• Global Signals• LE & Register Chains• Carry Chains• Local Interconnect• DirectLink™
• MultiTrack Interconnects – Row Interconnects– Column Interconnects
26
LA
B
Local Interconnect
• Groups 10 LEs Together• Provides Input Signals to Blocks (LABs, Memory, DSP
Blocks)
Lo
cal I
nte
rco
nn
ect
Lo
cal I
nte
rco
nn
ect
M51
2# of Local
Lines Depends on Block
27
DirectLink
• Allows Blocks to Drive Local Interconnects of Neighboring Blocks in the Same Row
Lo
cal I
nte
rco
nn
ectLE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
LE10
LE9
Lo
cal I
nte
rco
nn
ect
Lo
cal I
nte
rco
nn
ect
M512
LE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
LE10
LE9
28
DirectLink (cont.)
• Provides Fast Communication between Neighboring Blocks– One LE Has Fast Access to Up to 29 Other LEs in Area
• Saves Row Resources
29
MultiTrack Interconnect Architecture
• Provides Connections between All Device Blocks• Series of 3 Types of Continuous Row & Column
Interconnects – Each Has a Fixed Speed and Length– Constant Performance Across Family Members within Given
Area– Simplifies Block Design
• Same Routing Resources Available Regardless of Location
30
Row Resources
• 3 Row Interconnect Lengths– R4– R8– R24
R4
R8
R24
4 LABs
160 Lines Wide
48 Lines Wide
24 Lines Wide
31
Row Resources (cont.)
• Each Block Has Own Row Resource to Drive Right and Left
::
::
R4 Routing Line Driving
Right
R4 Routing Line Driving
Left
:: :: :: :: :: :: :: ::
32
Row Resource Details
• R4– Terminate at M-RAM
• R8– Only Connect to Local & R8/C8 Interconnects– Terminate at M-RAM– Faster than 2 R4s
• R24– Do Not Interface with Blocks Directly– Can Cross M-RAM– Fastest Resource for Long Connections (Ex. Design
Block to Design Block)
33
Column Resources
• 3 Interconnect Lengths– C4– C8– C16
• Features Similar to Row Interconnects– Each Block Has Column Resource to
Drive Up and Down– Interconnects Are Staggered– Interconnects Can Drive End-to-End
C4
C8
C16
4 L
AB
s
34
Presentation Outline
• Basic description of Stratix Altera Devices• NIOS II processor architecture• How to design a system using NIOS II processor
35
36
NIOS II Overview [3]
• Soft IP Core– A soft-core processor is a microprocessor fully described in
software, usually in an HDL, which can be synthesized in programmable hardware, such as FPGAs.
• Reduced Instruction Set Computer (RISC)• No pipeline, 5 or 6 stages pipeline configurations• Full 32-bit instruction set, data path, and address space• 32 general-purpose registers• 32 external interrupt sources• Access to a variety of on-chip peripherals, and interfaces
to off-chip memories and peripherals• Software development environment based on the GNU
C/C++ tool chain and Eclipse IDE
37
NIOS II Scalability
• Powerful multiprocessing systems can be built
38
NIOS II Processor Core [3]
39
Implementation
• The functional units of the Nios II architecture form the foundation for the Nios II instruction set.
• The Nios II architecture describes an instruction set, not a particular hardware implementation.
• Trade-offs:– More or less of a feature - amount of instruction cache memory. – Inclusion or exclusion of a feature - the JTAG debug module. – Hardware implementation or software emulation - divider
40
Types of Processors
41
Memory Organization
42
Cache Performance
Memory I-Cache D-Cache Normalised Performance
SDRAM No No 40.2%
SDRAM No Yes 55.2%
SDRAM Yes No 64.3%
SDRAM Yes Yes 96.4%
OnChip No No 100.0%
OnChip No Yes 98.0%
OnChip Yes No 110.2%
OnChip Yes Yes 105.6%Performance relative to on chip RAM with no Cache running dhry.c modified for unbuffered I/O
Memory I-Cache D-Cache Normalised Performance
SDRAM No No 40.2%
SDRAM No Yes 55.2%
SDRAM Yes No 64.3%
SDRAM Yes Yes 96.4%
OnChip No No 100.0%
OnChip No Yes 98.0%
OnChip Yes No 110.2%
OnChip Yes Yes 105.6%
43
Tightly Coupled Memory
• Fast data buffers • Fast sections of code • Fast interrupt handler • Critical loop • Constant access time; guaranteed not to have arbitration
delays • Up to 4 tightly coupled memories
• Software Guidelines – Software accesses tightly-coupled memory addresses just like
any other addresses. – Cache operations have no effect when targeting tightly-coupled
44
Pipelining
• Static branch prediction is implemented using the branch offset direction; – a negative offset is predicted as taken– a positive offset is predicted as not-taken
45
46
Presentation Outline
• Basic description of Stratix Altera Devices• NIOS II processor architecture
– Review pipelining techniques– Review memory access techniques
• How to design a system using NIOS II processor
47
48
Hardware Abstraction Layer (HAL) [4]
• Isolates the application software from hardware modifications.
• Applications are device-independent because they abstract information from such systems as: – Character mode devices: UART core, JTAG UART core, LCD
display controller– Flash memory devices– Timer devices– DMA controller core– Ethernet MAC/PHY Controller
• HAL application program interface (API) is integrated with the ANSI C standard library.
49
Layers of HAL API [4]
• HAL library generatioin:1. SOPC Builder generates a hardware system
2. Nios II IDE generates a custom HAL system library to match the hardware configuration
• Changes in the hardware configuration automatically propagate to the HAL device driver configuration
• NIOS II is programmed in C
50
Programming NIOS II Processor [4]
• Programming UART– Standard Input, Standard Output routines in C
---------------------------------------------------#include <stdio.h>#include <string.h>
int main (void){
char* msg = “hello world”;FILE* fp;fp = fopen (“/dev/uart1”, “w”);if (fp){
fprintf(fp, “%s”,msg);fclose (fp);
}return 0;
}
---------------------------------------------------
51
References
1. Altera Corp., Stratix & Stratix II Module 3: Using TriMatrix Memories, 2004
2. Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004.
3. Altera Corp., Nios II Processor Reference Handbook, 2005.
4. Altera Corp., Nios II Software Developer's Handbook, 2005.