1 ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic.

1

ALTERA FPGAs and NIOSII

ELG6158 Computer Systems Architecture

Miodrag Bolic

2

Presentation Outline

• Basic description of Stratix Altera Devices• NIOS II processor architecture• How to design a system using NIOS II processor

3

Stratix EP1S10 [2]

6

TriMatrix™ Memory [1]

M512 Blocks M4K Blocks M-RAMDedicated External Memory Interface

Look-Up Schemes Packet & Cell Buffering Cache

More Bits For Larger Memory Buffering

More Data Ports for Greater Memory Bandwidth

Small FIFOs Shift Register Rake Receiver

Correlator FIR Filter Delay Line

Header / Cell Storage Channelized

Functions ATM cell–packet

processing Nios Program Memory

Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data

Storage

512 bits per block + parity

4 Kbits per block + parity

512 Kbits per block + parity

7

Memory Bandwidth SummaryStratix Device Family [1]

Device Total RAM Bits

M-RAM Blocks

M4K Blocks M512 Blocks MaximumBandwidth

(Mbps)

EP1S10 920,448 1 60 94 1,245,024

EP1S20 1,669,248 2 82 194 2,096,928

EP1S25 1,944,576 2 138 224 2,894,400

EP1S30 3,317,184 4 171 295 3,750,192

EP1S40 3,423,744 4 183 384 4,384,800

EP1S60 5,215,104 6 292 574 6,762,528

EP1S80 7,427,520 9 364 767 8,784,720

9

Logic Array Blocks (LAB) [2]

• 10 LEs• Local Interconnect• LAB-Wide Control Signals

LE1

LE2

LE3

LE4

LE5

LE6

LE7

LE8

LE10

LE9

4

4

4

4

4

4

4

4

4

4

Control Signals

Lo

cal I

nte

rco

nn

ect

10

LAB Arrangement

• LABs Communicate Directly to Each Other & Other Blocks Both Horizontally & Vertically

LA

B

LA

B

LA

B

LA

B

LA

B

LA

B

LA

B

LA

B

LA

B

LA

B

M51

2

LA

B

LA

B

M51

2

LAB Row

LAB Column

11

Logic Elements

• Smallest Units of Logic• Used for Combinatorial/Registered Logic

Stratix™ LE

Carry-Out

Carry-In Register ChainInput

General Routing &

Local Routing

LUT ChainInput

LUT ChainOutput

Register ChainOutput

12

Total LE Resources

Device Total LEs

EP1S10 10,570

EP1S20 18,460

EP1S25 25,660

EP1S30 32,470

EP1S40 41,250

EP1S60 57,120

EP1S80 79,040

13

LE Datasheet Image

14

LE Features• 4-Input Look-Up Table (LUT)• Configurable Register• 2 Operation Modes• Dynamic Add/Subtract Control• Carry-Select Chain Logic• Performance-Enhancing Features

– LUT & Register Chain

• Area-Enhancing Features– Register Packing & Feedback

15

LE Inputs/Outputs• Inputs

– 4 Data– 2 LE Carry-Ins & 1 Lab Carry-In– 1 Dynamic Addition/Subtraction Control– Register Controls

• Outputs– 2 LE Carry-Outs– 2 Row/Column/DirectLink Outputs– 1 Local Output– 1 LUT Chain & 1 Register Chain

16

Operation Modes

• Normal– General Combinatorial or Registered Logic

• Dynamic Arithmetic– Used for

• Adders• Counters• Accumulators• Comparators

– Uses Carry Chain for Faster Operation

• Chosen Automatically by Quartus® II & NativeLink® Synthesis Tools– Based on Design & Design Constraints

17

LE Register Controls

• Clock/Clock Enable• Synchronous & Asynchronous Clear• Synchronous & Asynchronous Load & Data• Asynchronous Preset

– Preset Function Loads a ‘1

ALD/PRE

ADATA

D Q

ENACLRN

18

Normal Mode

Sync Load & Clear Logic

DDATA

4-Input LUT

Register Control Signals

Register Chain Input

Register Chain Output

LUT Chain Output

data1

data2

data3

data4

cin

Row, Column & DirectLink

Routing

Local Routing

Note:1) Functional Diagram Only. Please See Datasheet for more Details.

2) Addnsum & data1 connected via XOR logic

LUT Chain Input

Register Feedback

addnsub

(2)

19

Combinatorial Logic Only


DDATA

4-Input LUT




LUT Chain Output

data1

data2

data3

data4

cin


Routing

Local Routing



LUT Chain Input

Register Feedback

addnsub

(2)

20

Sequential Logic Only


DDATA

4-Input LUT




LUT Chain Output

data1

data2

data3

data4

cin


Routing

Local Routing



LUT Chain Input

Register Feedback

addnsub

(2)

21

Dynamic Arithmetic Mode


DDATA




data1

data2

addnsub


Routing

Local Routing

Note: Functional Diagram Only. Please See Datasheet for more Details.

Carry-Out Logic

Carry-In Logic

LAB Carry-In

Carry-In0Carry-In1

Sum Calculator

Carry Calculator

data3

Carry-In0Carry-In1

Carry-Out1

Carry-Out0

22

Carry-Select Logic

• Each Cell Pre-Calculates Sum & Carry-Out for Carry = 1 & Carry = 0

• Carry-In Selects which Pre-Calculation Is Used

01

A0+B0+1 A0+B0+0

COUT1

SUMOUT

COUT0

CIN

COUT

Single LUT

23

Carry Chain Details

• Carry Chains Begin & End in Any LE

• 2 Carry Chains Can Exist In Any LAB

• Carry-Select Generated in LEs 5 & 10– Every LE Not in Critical

Timing Path

LAB Carry-Out

LE1

LE2

LE3

LE4

Sum1

Sum2

Sum3

Sum4

A1B1

A2

B2

A3B3

A4

B4 LE4

LE2

LE3

LE1

0 1LAB Carry-In

LE3

LE5Sum5A5

B5

LE6

LE7

LE8

0 1

LE9

LE10

Sum6

Sum7

Sum8

Sum9

Sum10

A6B6

A7B7

A8B8

A9B9

A10B10

24

LE1

LUT & Register Chains

• LUT Chain– Output of LUT Connects Directly

to LUT Below– Available Only In Normal Mode– Ex. Wide Fan-In Functions

• Register Chain– Output of Register Connects

Directly to Register Below (Shift Register)

– LUT Can Be Used for Unrelated Function

– Ex. LE Shift Register

• Both Chains End at LAB Boundary

LUT D Q

LE2D Q

LEs 3 - 10

LUT Chain

Register Chain

LUT

25

Stratix Interconnects

• Global Signals• LE & Register Chains• Carry Chains• Local Interconnect• DirectLink™

• MultiTrack Interconnects – Row Interconnects– Column Interconnects

26

LA

B

Local Interconnect

• Groups 10 LEs Together• Provides Input Signals to Blocks (LABs, Memory, DSP

Blocks)

Lo

cal I

nte

rco

nn

ect

Lo

cal I

nte

rco

nn

ect

M51

2# of Local

Lines Depends on Block

27

DirectLink

• Allows Blocks to Drive Local Interconnects of Neighboring Blocks in the Same Row

Lo

cal I

nte

rco

nn

ectLE1

LE2

LE3

LE4

LE5

LE6

LE7

LE8

LE10

LE9

Lo

cal I

nte

rco

nn

ect

Lo

cal I

nte

rco

nn

ect

M512

LE1

LE2

LE3

LE4

LE5

LE6

LE7

LE8

LE10

LE9

28

DirectLink (cont.)

• Provides Fast Communication between Neighboring Blocks– One LE Has Fast Access to Up to 29 Other LEs in Area

• Saves Row Resources

29

MultiTrack Interconnect Architecture

• Provides Connections between All Device Blocks• Series of 3 Types of Continuous Row & Column

Interconnects – Each Has a Fixed Speed and Length– Constant Performance Across Family Members within Given

Area– Simplifies Block Design

• Same Routing Resources Available Regardless of Location

30

Row Resources

• 3 Row Interconnect Lengths– R4– R8– R24

R4

R8

R24

4 LABs

160 Lines Wide

48 Lines Wide

24 Lines Wide

31

Row Resources (cont.)

• Each Block Has Own Row Resource to Drive Right and Left

::

::

R4 Routing Line Driving

Right

R4 Routing Line Driving

Left

:: :: :: :: :: :: :: ::

32

Row Resource Details

• R4– Terminate at M-RAM

• R8– Only Connect to Local & R8/C8 Interconnects– Terminate at M-RAM– Faster than 2 R4s

• R24– Do Not Interface with Blocks Directly– Can Cross M-RAM– Fastest Resource for Long Connections (Ex. Design

Block to Design Block)

33

Column Resources

• 3 Interconnect Lengths– C4– C8– C16

• Features Similar to Row Interconnects– Each Block Has Column Resource to

Drive Up and Down– Interconnects Are Staggered– Interconnects Can Drive End-to-End

C4

C8

C16

4 L

AB

s

34


• Basic description of Stratix Altera Devices• NIOS II processor architecture• How to design a system using NIOS II processor

36

NIOS II Overview [3]

• Soft IP Core– A soft-core processor is a microprocessor fully described in

software, usually in an HDL, which can be synthesized in programmable hardware, such as FPGAs.

• Reduced Instruction Set Computer (RISC)• No pipeline, 5 or 6 stages pipeline configurations• Full 32-bit instruction set, data path, and address space• 32 general-purpose registers• 32 external interrupt sources• Access to a variety of on-chip peripherals, and interfaces

to off-chip memories and peripherals• Software development environment based on the GNU

C/C++ tool chain and Eclipse IDE

37

NIOS II Scalability

• Powerful multiprocessing systems can be built

38

NIOS II Processor Core [3]

39

Implementation

• The functional units of the Nios II architecture form the foundation for the Nios II instruction set.

• The Nios II architecture describes an instruction set, not a particular hardware implementation.

• Trade-offs:– More or less of a feature - amount of instruction cache memory. – Inclusion or exclusion of a feature - the JTAG debug module. – Hardware implementation or software emulation - divider

40

Types of Processors

41

Memory Organization

42

Cache Performance

Memory I-Cache D-Cache Normalised Performance

SDRAM No No 40.2%

SDRAM No Yes 55.2%

SDRAM Yes No 64.3%

SDRAM Yes Yes 96.4%

OnChip No No 100.0%

OnChip No Yes 98.0%

OnChip Yes No 110.2%

OnChip Yes Yes 105.6%Performance relative to on chip RAM with no Cache running dhry.c modified for unbuffered I/O

Memory I-Cache D-Cache Normalised Performance

SDRAM No No 40.2%

SDRAM No Yes 55.2%

SDRAM Yes No 64.3%

SDRAM Yes Yes 96.4%

OnChip No No 100.0%

OnChip No Yes 98.0%

OnChip Yes No 110.2%

OnChip Yes Yes 105.6%

43

Tightly Coupled Memory

• Fast data buffers • Fast sections of code • Fast interrupt handler • Critical loop • Constant access time; guaranteed not to have arbitration

delays • Up to 4 tightly coupled memories

• Software Guidelines – Software accesses tightly-coupled memory addresses just like

any other addresses. – Cache operations have no effect when targeting tightly-coupled

44

Pipelining

• Static branch prediction is implemented using the branch offset direction; – a negative offset is predicted as taken– a positive offset is predicted as not-taken

46


• Basic description of Stratix Altera Devices• NIOS II processor architecture

– Review pipelining techniques– Review memory access techniques

• How to design a system using NIOS II processor

48

Hardware Abstraction Layer (HAL) [4]

• Isolates the application software from hardware modifications.

• Applications are device-independent because they abstract information from such systems as: – Character mode devices: UART core, JTAG UART core, LCD

display controller– Flash memory devices– Timer devices– DMA controller core– Ethernet MAC/PHY Controller

• HAL application program interface (API) is integrated with the ANSI C standard library.

49

Layers of HAL API [4]

• HAL library generatioin:1. SOPC Builder generates a hardware system

2. Nios II IDE generates a custom HAL system library to match the hardware configuration

• Changes in the hardware configuration automatically propagate to the HAL device driver configuration

• NIOS II is programmed in C

50

Programming NIOS II Processor [4]

• Programming UART– Standard Input, Standard Output routines in C

---------------------------------------------------#include <stdio.h>#include <string.h>

int main (void){

char* msg = “hello world”;FILE* fp;fp = fopen (“/dev/uart1”, “w”);if (fp){

fprintf(fp, “%s”,msg);fclose (fp);

}return 0;

}

---------------------------------------------------

51

References

1. Altera Corp., Stratix & Stratix II Module 3: Using TriMatrix Memories, 2004

2. Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004.

3. Altera Corp., Nios II Processor Reference Handbook, 2005.

4. Altera Corp., Nios II Software Developer's Handbook, 2005.

http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf

http://www.altera.com/literature/hb/nios2/n2sw_nii5v2.pdf

1 ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic.

Documents

Transcript of 1 ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic.