DEMETRIOU DEMETRIS 02872 - FIT...

Frederick Institute of Technology

EE445 – Microprocessor Interfacing Design

Xilinx MicroBlaze/Picoblaze Soft Processor Core

DEMETRIOU DEMETRIS02872

MSc in Electrical Engineering

Supervisor: Dr. Tatas Konstantinos

Spring 2007

1. Introduction

MicroBlaze is one of the Industry’s Most Flexible Embedded Processing Solution. It is a processor that meets performance, feature, and cost targets and can be very challenging in today’s competitive environment. With the Xilinx's FPGA technology features, increased performance and higher density devices, scalable processor systems can changing processing needs. It's one flexible processor system that’s easy-to use, area-efficient, optimized for cost-sensitive designs, and able to give support into the future is delivered by the Xilinx MicroBlaze™ solution. Because the processor is a soft core, can be chosen from any combination of highly customizable features that will bring products to market faster, extend product’s life cycle, and avoid processor obsolescence.

The MicroBlaze™ soft processor is a 32-bit Harvard RISC architecture optimized for Xilinx FPGAs.

MicroBlaze Soft Processor CoreXilinx's 32-bit MicroBlaze™ soft processor solution is a soft core processor, meaning that it is implemented using general logic primitives rather than a hard, dedicated block in the FPGA.

The MicroBlaze soft core is licensed as part of the Xilinx Embedded Development Kit (EDK). The EDK is a complete embedded development solution that includes a library of peripheral IP cores, the Xilinx Platform Studio tool suite for intuitive hardware system creation, a Built-On Eclipse software development environment, GNU compiler, debugger and more. The MicroBlaze processor is also supported by third party development tools and Real Time Operating Systems (RTOS).

Key Features● Cost-efficient, high performance 32-bit soft processor● Optimized for Xilinx FPGAs● Highly configurable feature set● Co-processor interface for hardware acceleration● Fully supported by the Platform Studio embedded development environment● JTAG-based integrated debug support● Optional integrated single precision FPU

There are literally dozens of 8-bit microcontroller architectures and instruction sets. Modern FPGAs can efficiently implement practically any 8-bit microcontroller, and available FPGA soft cores support popular instruction sets such as the PIC, 8051, AVR, 6502, 8080, and Z80 microcontrollers. The Xilinx PicoBlaze microcontroller is specifically designed and optimized for the Virtex and Spartan series of FPGAs and CoolRunner-II CPLDs. The PicoBlaze solution consumes considerably less resources than some other comparable 8-bit microcontroller architectures. It is provided as a free, source-level VHDL file with royalty-free re-use within Xilinx FPGAs. Because it is delivered as VHDL source, the PicoBlaze microcontroller is immune to product obsolescence as the microcontroller can be retargeted to future generations of Xilinx FPGAs, exploiting future cost reductions and feature enhancements.

The PicoBlaze™ microcontroller is a compact, capable, and cost-effective fully embedded 8-bit RISC microcontroller core optimized for the Spartan™-3, Virtex™-II, and Virtex-II Pro™ FPGA families. The PicoBlaze microcontroller provides cost-efficient microcontroller-based control and simple data processing.

Key Features:• 16 byte-wide general-purpose data registers• 1K instructions of programmable on-chip program store, automatically loaded during FPGA configuration• Byte-wide Arithmetic Logic Unit (ALU) with CARRY and ZERO indicator flags• 64-byte internal scratchpad RAM• 256 input and 256 output ports for easy expansion and enhancement• Automatic 31-location CALL/RETURN stack• Predictable performance, always two clock cycles per instruction, up to 200 MHz or 100 MIPS in a Virtex-4™ FPGA and 88 MHz or 44 MIPS in a Spartan-3 FPGA• Fast interrupt response; worst-case 5 clock cycles• Assembler, instruction-set simulator support

PicoBlaze Microcontroller Embedded within an FPGA Provides the Optimal Balance between Microcontroller and FPGA Solutions:

2. General Description

RISC (Reduced Instruction Set Computer)A computer architecture that reduces chip complexity by using a limited number of instructions and simpler instructions. RISC compilers have to generate software routines to perform complex instructions that were previously done in hardware by CISC computers. In RISC, the microcode layer and associated overhead is eliminated.

RISC keeps instruction size constant, bans the indirect addressing mode and retains only those instructions that can be overlapped and made to execute in one machine cycle or less. The RISC chip is faster than its CISC counterpart and is designed and built more economically.

RISC became popular in microprocessors in the 1980s. The traditional CISC (Complex Instruction Set Computing) architecture uses many instructions that do long, complex operations. Each RISC instruction is executed much more quickly than a CISC instruction, and most computational tasks can be processed faster. Modern instruction sets combine attributes of CISC and RISC.

This Computer arithmetic-logic unit that uses a minimal instruction set, emphasizing the instructions used most often and optimizing them for the fastest possible execution. Software for RISC processors must handle more operations than traditional CISC [Complex Instruction Set Computer] processors, but RISC processors have advantages in applications that benefit from faster instruction execution, such as engineering and graphics workstations and parallel-processing systems. They are also less costly to design, test, and manufacture. In the mid-1990s RISC processors began to be used in personal computers instead of the CISC processors that had been used since the introduction of the microprocessor.

In computer science is referred as a computer in which the compiler and hardware are interlocked, and the compiler takes over some of the hardware functions of conventional computers and translates high-level-language programs directly into low-level machine code. Abbreviated RISC.

The RISC, is a CPU design philosophy that favors a reduced instruction set as well as a simpler set of instructions. The most common RISC microprocessors are Alpha, ARC, ARM, AVR, MIPS, PA-RISC, PIC, Power Architecture, and SPARC.

The idea was originally inspired by the discovery that many of the features that were included in traditional CPU designs to facilitate coding were being ignored by the programs that were running on them. Also these more complex features took several processor cycles to be performed. Additionally, the performance gap between the processor and main memory was increasing. This led to a number of techniques to streamline processing within the CPU, while at the same time attempting to reduce the total number of memory accesses.

RISC Vs. CISCThe RISC machine executes instructions faster because it does not have to go through a microcode conversion layer, but the RISC compiler generates more instructions than the CISC compiler for the same processing.

The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with it's predecessor: CISC architecture.

CISC ● Emphasis on hardware Includes multi-clock complex instructions● Memory-to-memory: "LOAD" and "STORE" incorporated in instructions● Small code sizes, high cycles per second● Transistors used for storing complex instructions

RISC● Emphasis on software● Single-clock, reduced instruction only● Register to register: "LOAD" and "STORE" are independent instructions● Low cycles per second, large code sizes● Spends more transistors on memory registers

The Overall RISC AdvantageToday, the Intel x86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. The price of RAM has decreased dramatically. In 1977, 1MB of DRAM cost about $5,000. By 1994, the same amount of memory cost only $6 (when adjusted for inflation). Compiler technology has also become more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal.

Harvard architecture

Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape (24 bits wide) and data in electro-mechanical counters (23 digits wide). These early machines had limited data storage, entirely contained within the data processing unit, and provided no access to the instruction storage as data, making loading and modifying programs an entirely offline process.

Memory detailsIn a Harvard architecture, there is no need to make the two memories share characteristics. In particular, the word width, timing, implementation technology, and memory address structure can differ. Instruction memory is often wider than data memory. In some systems, instructions can be stored in read-only memory while data memory generally requires random-access memory. In some systems, there is much more instruction memory than data memory so instruction addresses are much wider than data addresses.

Other modelsIn a computer with the contrasting von Neumann architecture, the CPU can be either reading an instruction or reading/writing data from/to the memory. Both cannot occur at the same time since the instructions and data use the same signal pathways and memory. In a computer with Harvard architecture, the CPU can read both an instruction and data from memory at the same time. A computer with Harvard architecture can be faster because it is able to fetch the next instruction at the same time it completes the current instruction. Speed is gained at the expense of more complex electrical circuitry.

SpeedIn recent years the speed of the CPU has grown many times in comparison to the access speed of the main memory. Care needs to be taken to reduce the number of times main memory is accessed in order to maintain performance. If, for instance, every instruction run in the CPU requires an access to memory, the computer gains nothing for increased CPU speed — a problem referred to as being memory bound.

Memory can be made much faster, but only at high cost. The solution then is to provide a small amount of very fast memory known as a cache. As long as the memory that the CPU needs is in the cache, the performance hit is much smaller than it is when the cache has to turn around and get the data from the main memory. Tuning the cache is an important aspect of computer design.

Modern high performance CPU chip designs incorporate aspects of both Harvard and von Neumann architecture. On chip cache memory is divided into an instruction cache and a data cache. Harvard architecture is used as the CPU accesses the cache. In the case of a cache miss, however, the data is retrieved from the main memory, which is not divided into separate instruction and data sections. Thus a von Neumann architecture is used for off chip memory access.

UsesHarvard architectures are also frequently used in:

● Specialized digital signal processors, DSPs, commonly used in audio or video processing products. For example, Blackfin processors by Analog Devices, Inc. use a Harvard architecture.

● Most general purpose small microcontrollers used in many electronics applications, such as the PIC by Microchip Technology, Inc., and AVR by Atmel Corp. These processors are characterized by having small amounts of program and data memory, and take advantage of the Harvard architecture and reduced instruction sets (RISC) to ensure that most instructions can be executed within only one machine cycle. The separate storage means the program and data memories can have different bit depths. Example: PICs have an 8-bit data word but (depending on specific range of PICs) a 12-, 14-, or 16-bit program word. This allows a single instruction to contain a full-size data constant. Other RISC architectures, for example the ARM, typically must use at least two instructions to load a full-size constant.

3. MicroBlaze Processor

The MicroBlaze AdvantageThe MicroBlaze core is a 32-bit RISC Harvard architecture soft processor core with 32 general purpose registers, ALU, and a rich instruction set optimized for embedded applications. It supports both on-chip block RAM and/or external memory. With the MicroBlaze soft processor solution, there is complete flexibility to select any combination of peripherals, memory and interface features that need to give the best system performance at the lowest cost on a single FPGA.

Hardware Acceleration using Fast Simplex LinkThe MicroBlaze Fast Simplex Link (FSL) lets you connect hardware co-processors to accelerate time-critical algorithms. The FSL channels are dedicated point-to-point data streaming interfaces. Each FSL channel provides a low latency interface to the processor pipeline making them ideal for extending the processor’s execution unit with custom hardware accelerators.

Floating-Point Unit SupportMicroBlaze introduces an integrated single precision, IEEE-754 compatible Floating Point Unit (FPU) option optimized for embedded applications such as industrial control, automotive, and office automation. The MicroBlaze FPU provides designers with a processor tailored to execute both integer and floating point operations.

Hardware ConfigurabilityThe MicroBlaze processor solution provides a high level of configurabilty to tailor the processor sub-system to the exact needs of the target embedded application. Configurable features such as the barrel shifter, divider, multiplier, instruction and data caches, FPU, FSL interfaces, hardware debug logic, and the hardware exceptions, provide great flexibility but does not add to the cost if they are not used.

Platform Studio Tool SuiteThe Embedded Development Kit (EDK) is an all encompassing solution for designing embedded programmable systems. This pre-configured kit includes the award-winning Platform Studio™ Tool Suite, the MicroBlaze soft processor core as well as all the documentation and soft peripheral IP that you require for designing FPGA-based embedded processor systems.

Embedded Development Kit and Platform Studio Tool SuiteFor development, Xilinx offers the Embedded Development Kit (EDK), which is the common design environment for both MicroBlaze and PowerPC-based embedded systems. The EDK is a set of microprocessor design tools and common software platforms, such as device drivers and protocol stacks. The EDK includes the Platform Studio tool suite, the MicroBlaze core, and a library of peripheral IP cores.

Using these tools, design engineers can define the processor subsystem hardware and configure the software platform, including generating a Board Support Package (BSP) for a variety of development boards. Platform Studio Software Development Kit (SDK) is based on the Eclipse open-source C development tool kit and includes a full-featured development environment and a feature-rich GUI debugger. The MicroBlaze processor is supported by the GNU compiler and debugger tools. The debugger connects the MicroBlaze via JTAG. For debugging visibility and control over the embedded system, design engineers can add the ChipScope Pro™ verification tools from Xilinx, which are integrated into the hardware/software debug capabilities of the EDK.

Note: Processor performance and size will vary with configuration options.

MicroBlaze Hardware Options and Configurable Blocks

Hardware Functions● Hardware Barrel Shifter● Hardware Divider● Machine Status Set and Clear Instructions● Hardware Exception Support● Processor Version Register● Floating-Point Unit (FPU)● Hardware Multiplier● Hardware Debug Logic

Cache Options● Configurable size 2kB - 64 kB● Configurable micro-cache size 64B – 1024B● 4 or 8 word cache lines

Bus Infrastructure● On-Chip Peripheral Bus (OPB) for interfacing to peripherals● Local Memory Bus (LMB) for fast local access memory● Fast Simplex Link (FSL) for interfacing to co-processors

4. MicroBlaze Architecture

The basic architecture consists of 32 general-purpose registers, an Arithmetic Logic Unit (ALU), a shift unit, and two levels of interrupt. It can be configured this basic design with more advanced features to allow the user to balance the required performance of the target application against the logic area cost of the soft processor.

The general purpose processor interface conforms to the CoreConnect On-Chip Peripheral Bus (OPB) standard.

The Fast Simplex Link (FSL) is a simple and powerful point-to-point interface that connects user-developed co-processors to the MicroBlaze pipeline.

Configurable Features

● Floating-Point Unit IEEE 754 compatible Single precision More Info

● Hardware Exception Support Unaligned access Illegal instruction Data bus error Instruction bus error Divide-by-zero Floating-point exceptions

● Fast Simplex Link Co-Processor Interface Direct access to the general purpose registers for hardware acceleration Up to 8 dedicated 32-bit input ports Up to 8 dedicated 32-bit output ports

● Instruction and Data Caches Configurable cache sizes:

➢ 2 KBytes to 64 KBytes: uses Block RAM resources➢ 64 Bytes to 1024 Bytes: uses LUT/Distributed RAM resources

Configurable cacheable range Direct mapped write-through operation 4 or 8 word cache lines

● System Interface Different combinations of OPB, LMB, XCL and FSL for flexible system design

➢ On-chip Peripheral Bus (OPB) for interfacing to peripherals➢ Local Memory Bus (LMB) for fast local access memory➢ Fast Simplex Link (FSL) for interfacing to co-processors

● Barrel Shifter 2 cycle operation

● Hardware Integer Divide 34 cycle operation

● Hardware Multiply 3 cycle operation Optional 64-bit multiply result

● Debug Logic JTAG control via a debug support core Up to 8 hardware break points Up to 4 read address watch points Up to 4 write address watch points

● Instruction Set Extensions Pattern Compare Instructions Machine Status Register Set and Clear

● Interrupt Signaling Edge or level Active high or low

After an overview of MicroBlaze™ features and architecture, follow informations including Big-Endian bit-reversed format, 32-bit general purpose registers, cache software support, and Fast Simplex Link interfaces. Will be covered the following:

● “Data Types and Endianness”● “Instructions”● “Registers”● “Pipeline Architecture”● “Memory Architecture”● “Reset, Interrupts, Exceptions, and Break”● “Instruction Cache”● “Data Cache”● “Floating Point Unit (FPU)”● “Fast Simplex Link (FSL)”● “Debug and Trace”

Data Types and Endianness

MicroBlaze uses Big-Endian bit-reversed format to represent data. The hardware supported data types for MicroBlaze are word, half word, and byte. The bit and byte organization for each type is shown in the following tables.

Word Data Type

Half Word Data Type

Byte Data Type

Instructions

All MicroBlaze instructions are 32 bits and are defined as either Type A or Type B. Type Ainstructions have up to two source register operands and one destination register operand.Type B instructions have one source register and a 16-bit immediate operand (which can be extended to 32 bits by preceding the Type B instruction with an IMM instruction). Type B instructions have a single destination register operand. Instructions are provided in thefollowing functional categories: arithmetic, logical, branch, load/store, and special.

Type A

Type B

Some MicroBlaze instructions are described next. For each instruction is provided the mnemonic, encoding, a description, pseudocode of its semantics, and a list of registers that it modifies.

add - Arithmetic Add

add rD, rA, rB Addaddc rD, rA, rB Add with Carryaddk rD, rA, rB Add and Keep Carryaddkc rD, rA, rB Add with Carry and Keep Carry

DescriptionThe sum of the contents of registers rA and rB, is placed into register rD.

Bit 3 of the instruction (labeled as K in the figure) is set to one for the mnemonic addk. Bit 4 of the instruction (labeled as C in the figure) is set to one for the mnemonic addc. Both bits are set to one for the mnemonic addkc.

When an add instruction has bit 3 set (addk, addkc), the carry flag will Keep its previous value regardless of the outcome of the execution of the instruction. If bit 3 is cleared (add, addc), then the carry flag will be affected by the execution of the instruction.

When bit 4 of the instruction is set to one (addc, addkc), the content of the carry flag (MSR[C]) affects the execution of the instruction. When bit 4 is cleared (add, addk), the content of the carry flag does not affect the execution of the instruction (providing a normal addition).

Pseudocode:if C = 0 then

(rD) ←(rA) + (rB)else

(rD) ← (rA) + (rB) + MSR[C]if K = 0 then

MSR[C] ← CarryOut

Registers Altered:● rD● MSR[C]

Latency:1 cycle

Note:The C bit in the instruction opcode is not the same as the carry bit in the MSR.

The “add r0, r0, r0” (= 0x00000000) instruction is never used by the compiler and usually indicates uninitialized memory. If are using illegal instruction exceptions can be trapped these instructions by setting the MicroBlaze option C_OPCODE_0x0_ILLEGAL=1.

andi - Logial AND with Immediate

andi rD, rA, IMM

Description:The contents of register rA are ANDed with the value of the IMM field, sign-extended to 32 bits; the result is placed into register rD.

Pseudocode:(rD) ←(rA) sext(IMM)∧

Registers Altered:● rD

Latency:1 cycle

Note:By default, Type B Instructions will take the 16-bit IMM field value and sign extend it to 32 bits to use as the immediate operand. This behavior can be overridden by preceding the Type B instruction with an IMM instruction. See the following imm instruction for details on using 32-bit immediate values.

imm - Immediate

imm IMM

Description:The instruction imm loads the IMM value into a temporary register. It also locks this value so it can be used by the following instruction and form a 32-bit immediate value.

The instruction imm is used in conjunction with Type B instructions. Since Type B instructions have only a 16-bit immediate value field, a 32-bit immediate value cannot be used directly. However, 32-bit immediate values can be used in MicroBlaze. By default, Type B Instructions will take the 16-bit IMM field value and sign extend it to 32 bits to use as the immediate operand. This behavior can be overridden by preceding the Type B instruction with an imm instruction. The imm instruction locks the 16-bit IMM value temporarily for the next instruction. A Type B instruction that immediately follows the imm instruction will then form a 32-bit immediate value from the 16-bit IMM value of the imm instruction (upper 16 bits) and its own 16-bit immediate value field (lower 16 bits). If no Type B instruction follows the IMM instruction, the locked value gets unlocked and becomes useless.

Latency1 cycle

Notes:The imm instruction and the Type B instruction following it are atomic, hence no interrupts are allowed between them.

The assembler provided by Xilinx automatically detects the need for imm instructions. When a 32-bit IMM value is specified in a Type B instruction, the assembler converts the IMM value to a 16-bit one to assemble the instruction and inserts an imm instruction beforeit in the executable file.

bne - Branch if Not Equal

bne rA, rB Branch if Not Equalbned rA, rB Branch if Not Equal with Delay

Description:Branch if rA not equal to 0, to the instruction located in the offset value of rB. The target of the branch will be the instruction at address PC + rB.

The mnemonic bned will set the D bit. The D bit determines whether there is a branch delay slot or not. If the D bit is set, it means that there is a delay slot and the instruction following the branch (i.e. in the branch delay slot) is allowed to complete execution before executing the target instruction. If the D bit is not set, it means that there is no delay slot, so the instruction to be executed after the branch is the target instruction.

Pseudocode:If rA ≠ 0 then

PC ←PC + rBelse

PC ←PC + 4if D = 1 then

allow following instruction to complete execution Registers Altered:

● PC

Latency:1 cycle (if branch is not taken)2 cycles (if branch is taken and the D bit is set)3 cycles (if branch is taken and the D bit is not set)

Note:A delay slot must not be used by the following: IMM, branch, or break instructions. Interrupts and external hardware breaks are deferred until after the delay slot branch has been completed.

cmp - Integer Compare

cmp rD, rA, rB compare rB with rA (signed)cmpu rD, rA, rB compare rB with rA (unsigned)

Description:The contents of register rA is subtracted from the contents of register rB and the result is placed into register rD.

The MSB bit of rD is adjusted to shown true relation between rA and rB. If the U bit is set, rA and rB is considered unsigned values. If the U bit is clear, rA and rB is considered signed values.

Pseudocode: __(rD) ←(rB) + (rA) + 1(rD)(MSB) ← (rA) > (rB)

Registers Altered:● rD

Latency:1 cycle.

Registers

MicroBlaze has an orthogonal instruction set architecture. It has thirty-two 32-bit general purpose registers and up to eighteen 32-bit special purpose registers, depending on configured options.➢ General Purpose RegistersThe thirty-two 32-bit General Purpose Registers are numbered R0 through R31. The register file is reset on bit stream download (reset value is 0x00000000).

➢ Special Purpose Registers● Program Counter (PC)

The Program Counter is the 32-bit address of the execution instruction. It can be read with an MFS instruction, but it cannot be written with an MTS instruction. When used with the MFS instruction the PC register is specified by setting Sa = 0x0000.

● Machine Status Register (MSR)The Machine Status Register contains control and status bits for the processor. It can be read with an MFS instruction. When reading the MSR, bit 29 is replicated in bit 0 as the carry copy. MSR can be written using either an MTS instruction or the dedicated MSRSET and MSRCLR instructions.

When writing to the MSR, the Carry bit takes effect immediately and the remaining bits take effect one clock cycle later. Any value written to bit 0 is discarded. When used with anMTS or MFS instruction, the MSR is specified by setting Sx = 0x0001.

● Exception Address Register (EAR)The Exception Address Register stores the full load/store address that caused the exception. For an unaligned access exception that means the unaligned access address, and for a DOPB exception, the failing OPB data access address. The contents of this register is undefined for all other exceptions. When read with the MFS instruction, the EAR is specified by setting Sa = 0x0003.

● Exception Status Register (ESR)The Exception Status Register contains status bits for the processor. When read with the MFS instruction, the ESR is specified by setting Sa = 0x0005.

● Branch Target Register (BTR)The Branch Target Register only exists if the MicroBlaze processor is configured to use exceptions. The register stores the branch target address for all delay slot branch instructions executed while MSR[EIP] = 0. If an exception is caused by an instruction in a delay slot (i.e. ESR[DS]=1), the exception handler should return execution to the address stored in BTR instead of the normal exception return address stored in R17. When read with the MFS instruction, the BTR is specified by setting Sa = 0x000B.

● Floating Point Status Register (FSR)The Floating Point Status Register contains status bits for the floating point unit. It can be read with an MFS, and written with an MTS instruction. When read or written, the register is specified by setting Sa = 0x0007.

● Processor Version Register (PVR)The Processor Version Register is controlled by the C_PVR configuration option on MicroBlaze. When C_PVR is set to 0 the processor does not implement any PVR and MSR[PVR]=0. If C_PVR is set to 1, MicroBlaze implements only the first register: PVR0, and if set to 2, all 12 PVR registers (PVR0 to PVR11) are implemented.

When read with the MFS instruction the PVR is specified by setting Sa = 0x200x, with x being the register number between 0x0 and 0xB.

Pipeline Architecture

MicroBlaze instruction execution is pipelined. For most instructions, each stage takes one clock cycle to complete. Consequently, the number of clock cycles necessary for a specificinstruction to complete is equal to the number of pipeline stages, and one instruction is completed on every cycle. A few instructions require multiple clock cycles in the execute stage to complete. This is achieved by stalling the pipeline.

When executing from slower memory, instruction fetches may take multiple cycles. This additional latency directly affects the efficiency of the pipeline. MicroBlaze implements an instruction prefetch buffer that reduces the impact of such multi-cycle instruction memory latency. While the pipeline is stalled by a multi-cycle instruction in the execution stage, the prefetch buffer continues to load sequential instructions. Once the pipeline resumes execution, the fetch stage can load new instructions directly from the prefetch buffer rather than having to wait for the instruction memory access to complete.

➢ Three Stage PipelineWhen area optimization is enabled, the pipeline is divided into three stages to minimize hardware cost: Fetch, Decode, and Execute.

➢ Five Stage PipelineWhen area optimization is disabled, the pipeline is divided into five stages to maximize performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and Writeback (WB).

➢ BranchesNormally the instructions in the fetch and decode stages (as well as prefetch buffer) are flushed when executing a taken branch. The fetch pipeline stage is then reloaded with a new instruction from the calculated branch address. A taken branch in MicroBlaze takes three clock cycles to execute, two of which are required for refilling the pipeline. To reduce this latency overhead, MicroBlaze supports branches with delay slots.

● Delay SlotsWhen executing a taken branch with delay slot, only the fetch pipeline stage in MicroBlaze is flushed. The instruction in the decode stage (branch delay slot) is allowed to complete. This technique effectively reduces the branch penalty from two clock cycles to one. Branchinstructions with delay slots have a D appended to the instruction mnemonic. For example, the BNE instruction does not execute the subsequent instruction (does not have a delay slot), whereas BNED executes the next instruction before control is transferred to the branch location.

A delay slot must not contain the following instructions: IMM, branch, or break. Interrupts and external hardware breaks are deferred until after the delay slot branch has been completed.

Instructions that could cause recoverable exceptions (e.g. unaligned word or halfword load and store) are allowed in the delay slot. If an exception is caused in a delay slot the ESR[DS] bit is set, and the exception handler is responsible for returning the execution to the branch target (stored in the special purpose register BTR). If the ESR[DS] bit is set, register R17 is not valid (otherwise it contains the address following the instruction causing the exception).

Memory Architecture

It is known that MicroBlaze is implemented with a Harvard memory architecture, i.e. instruction and data accesses are done in separate address spaces. Each address space has a 32 bit range (i.e. handles up to 4 gigabytes of instructions and data memory respectively). The instruction and data memory ranges can be made to overlap by mapping them both to the same physical memory. The latter is useful for software debugging.

Also both instruction and data interfaces of MicroBlaze are 32 bit wide and use big endian, bitreversed format. MicroBlaze supports word, halfword, and byte accesses to data memory.

Data accesses must be aligned (i.e. word accesses must be on word boundaries, halfwordon halfword boundaries), unless the processor is configured to support unaligned exceptions. All instruction accesses must be word aligned.

MicroBlaze does not separate data accesses to I/O and memory (i.e. it uses memory mapped I/O). The processor has up to three interfaces for memory accesses: Local Memory Bus (LMB), On-Chip Peripheral Bus (OPB), and Xilinx CacheLink (XCL). The LMB memory address range must not overlap with OPB or XCL ranges.

MicroBlaze has a single cycle latency for accesses to local memory (LMB) and for cache read hits, except with area optimization enabled when data side accesses and data cache read hits require two clock cycles. A data cache write normally has two cycles of latency (more if the posted-write buffer in the memory controller is full).

The MicroBlaze instruction and data caches can be configured to use 4 or 8 word cache lines. When using a longer cache line, more bytes are prefetched, which generally improves performance for software with sequential access patterns. However, for software with a more random access pattern the performance can instead decrease for a given cache size. This is caused by a reduced cache hit rate due to fewer available cache lines.

Reset, Interrupts, Exceptions, and Break

MicroBlaze supports reset, interrupt, user exception, break, and hardware exceptions. The following section describes the execution flow associated with each of these events.

The relative priority starting with the highest is:1. Reset2. Hardware Exception3. Non-maskable Break4. Break5. Interrupt6. User Vector (Exception)

The table below defines the memory address locations of the associated vectors and the hardware enforced register file locations for return addresses. Each vector allocates two addresses to allow full address range branching (requires an IMM followed by a BRAI instruction). The address range 0x28 to 0x4F is reserved for future software support by Xilinx. Allocating these addresses for user applications is likely to conflict with future releases of EDK support software.

Vectors and Return Address Register File Location

➢ ResetWhen a Reset or Debug_Rst(1) occurs, MicroBlaze flushes the pipeline and starts fetchinginstructions from the reset vector (address 0x0). Both external reset signals are active highand should be asserted for a minimum of 16 cycles.

Equivalent PseudocodePC ← 0x00000000MSR ← C_RESET_MSREAR ← 0ESR ← 0FSR ← 0

➢ Hardware ExceptionsMicroBlaze can be configured to trap the following internal error conditions: illegal instruction, instruction and data bus error, and unaligned access. The divide by zero exception can only be enabled if the processor is configured with a hardware divider (C_USE_DIV=1). When configured with a hardware floating point unit (C_USE_FPU=1), it can also trap the following floating point specific exceptions: underflow, overflow, float division-by-zero, invalid operation, and denormalized operand error.

A hardware exception causes MicroBlaze to flush the pipeline and branch to the hardware exception vector (address 0x20). The exception also loads the decode stage program counter value of the subsequent instruction into the general purpose register R17, unless the exception is caused by an instruction in a branch delay slot. The execution stage instruction in the exception cycle is not executed. If the exception is caused by an instruction in a branch delay slot, the ESR[DS] bit is set. In this case the exception handler should resume execution from the branch target address stored in BTR.

The EE and EIP bits in MSR are automatically reverted when executing the RTED instruction.Exception Causes:

● Instruction Bus ExceptionThe instruction On-chip Peripheral Bus exception is caused by an active error signal from the slave (IOPB_errAck) or timeout signal from the arbiter (IOPB_timeout). The instructions side local memory (ILMB) and CacheLink (IXCL) interfaces cannot cause instruction bus exceptions.

● Illegal Opcode ExceptionThe illegal opcode exception is caused by an instruction with an invalid major opcode (bits 0 through 5 of instruction). Bits 6 through 31 of the instruction are not checked. Optional processor instructions are detected as illegal if not enabled. If the optional feature C_OPCODE_0x0_ILLEGAL is enabled, an illegal opcode exception is also caused if the instruction is equal to 0x00000000.

● Data Bus ExceptionThe data On-chip Peripheral Bus exception is caused by an active error signal from the slave (DOPB_errAck) or timeout signal from the arbiter (DOPB_timeout). The data side local memory (DLMB) and CacheLink (DXCL) interfaces can not cause data bus exceptions.

● Unaligned ExceptionThe unaligned exception is caused by a word access where the address to the data bus has bits 30 or 31 set, or a half-word access with bit 31 set.

● Divide by Zero ExceptionThe divide-by-zero exception is caused by an integer division (idiv or idivu) where the divisor is zero.

● FPU Exception

An FPU exception is caused by an underflow, overflow, divide-by-zero, illegal operation, or denormalized operand occurring with a floating point instruction.

Underflow occurs when the result is denormalized. Overflow occurs when the result is not-a-number (NaN). The divide-by-zero FPU exception is caused by the rA operand to fdiv being

zero when rB is not infinite. Illegal operation is caused by a signaling NaN operand or by illegal infinite or

zero operand combinations.

Equivalent PseudocodeESR[DS] ← exception in delay slotif ESR[DS] then

BTR ← branch target PCr17 ← invalid value

elser17 ← PC + 4

PC ← 0x00000020MSR[EE] ← 0MSR[EIP]← 1ESR[EC] ← exception specific valueESR[ESS]← exception specific valueEAR ← exception specific valueFSR ← exception specific value

➢ BreaksThere are two kinds of breaks:

1. Hardware (external) breaks2. Software (internal) breaks

● Hardware BreaksHardware breaks are performed by asserting the external break signal (i.e. the Ext_BRK and Ext_NM_BRK input ports). On a break, the instruction in the execution stage completes while the instruction in the decode stage is replaced by a branch to the break vector (address 0x18). The break return address (the PC associated with the instruction in the decode stage at the time of the break) is automatically loaded into general purpose register R16. MicroBlaze also sets the Break In Progress (BIP) flag in the Machine Status Register (MSR).

A normal hardware break (i.e the Ext_BRK input port) is only handled when there is no break in progress (i.e MSR[BIP] is set to 0). The Break In Progress flag disables interrupts.A non-maskable break (i.e the Ext_NM_BRK input port) is always handled immediately.

The BIP bit in the MSR is automatically cleared when executing the RTBD instruction.

● Software BreaksTo perform a software break, use the brk and brki instructions.

● LatencyThe time it takes MicroBlaze to enter a break service routine from the time the break occurs depends on the instruction currently in the execution stage and the latency to the memory storing the break vector.

Equivalent Pseudocoder16 ← PCPC ← 0x00000018MSR[BIP] ← 1

➢ InterruptMicroBlaze supports one external interrupt source (connected to the Interrupt input port). The processor only reacts to interrupts if the Interrupt Enable (IE) bit in the Machine Status Register (MSR) is set to 1. On an interrupt, the instruction in the execution stage completes while the instruction in the decode stage is replaced by a branch to the interruptvector (address 0x10). The interrupt return address (the PC associated with the instruction in the decode stage at the time of the interrupt) is automatically loaded into general purpose register R14. In addition, the processor also disables future interrupts by clearing the IE bit in the MSR. The IE bit is automatically set again when executing the RTID instruction.

Interrupts are ignored by the processor if either of the break in progress (BIP) or exception in progress (EIP) bits in the MSR are set to 1.

● LatencyThe time it takes MicroBlaze to enter an Interrupt Service Routine (ISR) from the time an interrupt occurs, depends on the configuration of the processor and the latency of the memory controller storing the interrupt vectors. If MicroBlaze is configured to have a hardware divider, the largest latency happens when an interrupt occurs during the execution of a division instruction.

Equivalent Pseudocoder14 ← PCPC ← 0x00000010MSR[IE] ← 0

➢ User Vector (Exception)The user exception vector is located at address 0x8. A user exception is caused by inserting a ‘BRALID Rx,0x8’ instruction in the software flow. Although Rx could be any general purpose register, Xilinx recommends using R15 for storing the user exception return address, and to use the RTSD instruction to return from the user exception handler.

Pseudocoderx ← PCPC ← 0x00000008

➢ Interrupt and Exception HandlingMicroBlaze assumes certain address locations for handling interrupts and exceptions as indicated in the following table. At these locations, code is written to jump to the appropriate handlers.

Interrupt and Exception Handling

The code expected at these locations is as shown below. For programs compiled without the -xl-mode-xmdstub compiler option, the crt0.o initialization file is passed by the mb-gcc compiler to the mb-ld linker for linking. This file sets the appropriate addresses of the exception handlers.

For programs compiled with the -xl-mode-xmdstub compiler option, the crt1.o initialization file is linked to the output program. This program has to be run with the xmdstub already loaded in the memory at address location 0x0. Hence at run-time, the initialization code in crt1.o writes the appropriate instructions to location 0x8 through 0x14 depending on the address of the exception and interrupt handlers.

The following is code for passing control to Exception and Interrupt handlers:

MicroBlaze allows exception and interrupt handler routines to be located at any address location addressable using 32 bits. The user exception handler code starts with the label _exception_handler, the hardware exception handler starts with _hw_exception_handler, while the interrupt handler code starts with the label _interrupt_handler.

In the current MicroBlaze system, there are dummy routines for interrupt and exception handling, which can be changed. In order to override these routines and link the interrupt and exception handlers, must be defined the interrupt handler code with an attribute interrupt_handler.

Instruction Cache

MicroBlaze may be used with an optional instruction cache for improved performance when executing code that resides outside the LMB address range.

The instruction cache has the following features:● Direct mapped (1-way associative)● User selectable cacheable memory address range● Configurable cache and tag size● Caching over CacheLink (XCL) interface● Option to use 4 or 8 word cache-line● Cache on and off controlled using a bit in the MSR● Optional WIC instruction to invalidate instruction cache lines

➢ General Instruction Cache FunctionalityWhen the instruction cache is used, the memory address space is split into two segments: a cacheable segment and a non-cacheable segment. The cacheable segment is determined by two parameters: C_ICACHE_BASEADDR and C_ICACHE_HIGHADDR. Alladdresses within this range correspond to the cacheable address segment. All other addresses are non-cacheable.

The cacheable instruction address consists of two parts: the cache address, and the tag address. The MicroBlaze instruction cache can be configured from 64 bytes to 64 kB. This corresponds to a cache address of between 6 and 16 bits. The tag address together with the cache address should match the full address of cacheable memory. When selecting cache sizes below 2 kB, distributed RAM is used to implement the Tag RAM and Instruction RAM

For example: in a MicroBlaze configured with C_ICACHE_BASEADDR= 0x00300000, C_ICACHE_HIGHADDR=0x0030ffff, C_CACHE_BYTE_SIZE=4096, and C_ICACHE_LINELEN=8; the cacheable memory of 64 kB uses 16 bits of byte address, and the 4 kB cache uses 12 bits of byte address, thus the required address tag width is: 16-12=4 bits. The total number of block RAM primitives required in this configuration is: 2 RAMB16 for storing the 1024 instruction words, and 1 RAMB16 for 128 cache line entries, each consisting of: 4 bits of tag, 8 word-valid bits, 1 line-valid bit. In total 3 RAMB16 primitives.

Instruction Cache Organization

➢ Instruction Cache OrganizationFor every instruction fetched, the instruction cache detects if the instruction address belongs to the cacheable segment. If the address is non-cacheable, the cache controller ignores the instruction and lets the OPB or LMB complete the request. If the address is cacheable, a lookup is performed on the tag memory to check if the requested address is currently cached. The lookup is successful if: the word and line valid bits are set, and the tag address matches the instruction address tag segment. On a cache miss, the cache controller requests the new instruction over the instruction CacheLink (IXCL) interface, and waits for the memory controller to return the associated cache line.

➢ Instruction Cache Software Support

● MSR BitThe ICE bit in the MSR provides software control to enable and disable caches. The contents of the cache are preserved by default when the cache is disabled. You can invalidate cache lines using the WIC instruction or using the hardware debug logic of MicroBlaze.

● WIC InstructionThe optional WIC instruction (C_ALLOW_ICACHE_WR=1) is used to invalidate cache lines in the instruction cache from an application. The cache must be disabled (MSR[ICE]=0) when the instruction is executed.

Data Cache

MicroBlaze may be used with an optional data cache for improved performance. The cached memory range must not include addresses in the LMB address range.

The data cache has the following features● Direct mapped (1-way associative)● Write-through● User selectable cacheable memory address range● Configurable cache size and tag size● Caching over CacheLink (XCL) interface● Option to use 4 or 8 word cache-lines● Cache on and off controlled using a bit in the MSR● Optional WDC instruction to invalidate data cache lines

➢ General Data Cache FunctionalityWhen the data cache is used, the memory address space is split into two segments: a cacheable segment and a non-cacheable segment. The cacheable area is determined by two parameters: C_DCACHE_BASEADDR and C_DCACHE_HIGHADDR. All addresses within this range correspond to the cacheable address space. All other addresses are noncacheable.

The cacheable data address consists of two parts: the cache address, and the tag address. The MicroBlaze data cache can be configured from 64 bytes to 64 kB. This corresponds to a cache address of between 6 and 16 bits. The tag address together with the cache address should match the full address of cacheable memory. When selecting cache sizes below 2 kB, distributed RAM is used to implement the Tag RAM and Data RAM.

Data Cache Organization

For example, in a MicroBlaze configured with C_ICACHE_BASEADDR= 0x00400000, C_ICACHE_HIGHADDR=0x00403fff, C_CACHE_BYTE_SIZE=2048, and C_ICACHE_LINELEN=4; the cacheable memory of 16 kB uses 14 bits of byte address, and the 2 kB cache uses 11 bits of byte address, thus the required address tag width is 14-11=3 bits. The total number of block RAM primitives required in this configuration is 1 RAMB16 for storing the 512 instruction words, and 1 RAMB16 for 128 cache line entries, each consisting of 3 bits of tag, 4 word-valid bits, 1 line-valid bit. In total, 2 RAMB16 primitives.

➢ Data Cache OperationThe MicroBlaze data cache implements a write-through protocol. Provided that the cache is enabled, a store to an address within the cacheable range generates an equivalent byte,halfword, or word write over the data CacheLink (DXCL) to external memory. The write also updates the cached data if the target address word is in the cache (i.e. the write is a cache-hit). A write cache-miss does not load the associated cache line into the cache.

Provided that the cache is enabled a load from an address within the cacheable range triggers a check to determine if the requested data is currently cached. If it is (i.e. on a cache-hit) the requested data is retrieved from the cache. If not (i.e. on a cache-miss) the address is requested over data CacheLink (DXCL), and the processor pipeline stalls until the cache line associated to the requested address is returned from the external memory controller.

➢ Data Cache Software Support

● MSR BitThe DCE bit in the MSR controls whether or not the cache is enabled. When disabling caches the user must ensure that all the prior writes within the cacheable range have been completed in external memory before reading back over OPB. This can be done by writing to a semaphore immediately before turning off caches, and then in a loop poll until it has been written.

The contents of the cache are preserved when the cache is disabled.

● WDC InstructionThe optional WDC instruction (C_ALLOW_DCACHE_WR=1) is used to invalidate cache lines in the data cache from an application.

Floating Point Unit (FPU)

The MicroBlaze floating point unit is based on the IEEE 754 standard:

● Uses IEEE 754 single precision floating point format, including definitions for infinity, not-a-number (NaN), and zero

● Supports addition, subtraction, multiplication, division, and comparison instructions● Implements round-to-nearest mode● Generates sticky status bits for: underflow, overflow, and invalid operation

For improved performance, the following non-standard simplifications are made:● Denormalized (i.e. 1-bit sign) operands are not supported. A hardware floating point

operation on a denormalized number returns a quiet NaN and sets the denormalized operand error bit in FSR.

● A denormalized result is stored as a signed 0 with the underflow bit set in FSR. This method is commonly referred to as Flush-to-Zero (FTZ)

● An operation on a quiet NaN returns the fixed NaN: 0xFFC00000, rather than one of the NaN operands

● Overflow as a result of a floating point operation always returns signed ∞, even when the exception is trapped

➢ FormatAn IEEE 754 single precision floating point number is composed of the following three fields:

1. 1-bit sign2. 8-bit biased exponent3. 23-bit fraction (a.k.a. mantissa or significand)

The fields are stored in a 32 bit word

IEEE 754 Single Precision Format

The value of a floating point number v in MicroBlaze has the following interpretation:

1. If exponent = 255 and fraction <> 0, then v= NaN, regardless of the sign bit2. If exponent = 255 and fraction = 0, then v= [(-1)^sign] * ∞3. If 0 < exponent < 255, then v = [(-1)^sign] * [2^(exponent-127)] * (1.fraction)4. If exponent = 0 and fraction <> 0, then v = [(-1)^sign] * [2^(-126 )] * (0.fraction)5. If exponent = 0 and fraction = 0, then v = [(-1)^sign] * 0

For practical purposes only 3 and 5 are useful, while the others all represent either an error or numbers that can no longer be represented with full precision in a 32 bit format.

➢ RoundingThe MicroBlaze FPU only implements the default rounding mode, “Round-to-nearest”, specified in IEEE 754. By definition, the result of any floating point operation should return the nearest single precision value to the infinitely precise result. If the two nearest representable values are equally near, then the one with its least significant bit zero is returned.

➢ OperationsAll MicroBlaze FPU operations use the processors general purpose registers rather than a dedicated floating point register file.

● ArithmeticThe FPU implements the following floating point operations:

addition, fadd subtraction, fsub multiplication, fmul division, fdiv

● ComparisonThe FPU implements the following floating point comparisons:

compare less-than, fcmp.lt compare equal, fcmp.eq compare less-or-equal, fcmp.le compare greater-than, fcmp.gt compare not-equal, fcmp.ne compare greater-or-equal, fcmp.ge compare unordered, fcmp.un (used for NaN)

➢ ExceptionsThe floating point unit uses the regular hardware exception mechanism in MicroBlaze. When enabled, exceptions are thrown for all the IEEE standard conditions: underflow, overflow, divide-by-zero, and illegal operation, as well as for the MicroBlaze specific exception: denormalized operand error.A floating point exception inhibits the write to the destination register (Rd). This allows a floating point exception handler to operate on the uncorrupted register file.

Fast Simplex Link (FSL)

MicroBlaze can be configured with up to eight Fast Simplex Link (FSL) interfaces, each consisting of one input and one output port. The FSL channels are dedicated unidirectional point-to-point data streaming interfaces.The FSL interfaces on MicroBlaze are 32 bits wide. A separate bit indicates whether the sent/received word is of control or data type. The get instruction in the MicroBlaze ISA is used to transfer information from an FSL port to a general purpose register. The put instruction is used to transfer data in the opposite direction. Both instructions come in 4 flavors: blocking data, non-blocking data, blocking control, and non-blocking control.

➢ Hardware Acceleration using FSLEach FSL provides a low latency dedicated interface to the processor pipeline. Thus they are ideal for extending the processors execution unit with custom hardware accelerators.

FSL Used with HW Accelerated Function fx

This method is similar to extending the ISA with custom instructions, but has the benefit of not making the overall speed of the processor pipeline dependent on the custom function. Also, there are no additional requirements on the software tool chain associated with this type of functional extension.

Debug and Trace

MicroBlaze features a debug interface to support JTAG based software debugging tools (commonly known as BDM or Background Debug Mode debuggers) like the Xilinx Microprocessor Debug (XMD) tool. The debug interface is designed to be connected to the Xilinx Microprocessor Debug Module (MDM) core, which interfaces with the JTAG port of Xilinx FPGAs. Multiple MicroBlaze instances can be interfaced with a single MDM to enable multiprocessor debugging. The debugging features include:

● Configurable number of hardware breakpoints and watchpoints and unlimited software breakpoints

● External processor control enables debug tools to stop, reset, and single step MicroBlaze

● Read from and write to: memory, general purpose registers, and special purpose register, except EAR, ESR, BTR and PVR0 - PVR11, which can only be read

● Support for multiple processors● Write to instruction and data caches

Trace OverviewThe MicroBlaze trace interface exports a number of internal state signals for performance monitoring and analysis. Xilinx recommends that users only use the trace interface through Xilinx developed analysis cores. This interface is not guaranteed to be backward compatible in future releases of MicroBlaze.

5. PicoBlaze™ 8-bit Embedded Microcontroller

The PicoBlaze microcontroller is optimized for efficiency and low deployment cost. It occupies just 96 FPGA slices, or only 12.5% of an XC3S50 FPGA and a miniscule 0.3% of an XC3S5000 FPGA. In typical implementations, a single FPGA block RAM stores up to 1024 program instructions, which are automatically loaded during FPGA configuration. Even with such resource efficiency, the PicoBlaze microcontroller performs a respectable 44 to 100 million instructions per second (MIPS) depending on the target FPGA family and speed grade.

The PicoBlaze microcontroller core is totally embedded within the target FPGA and requires no external resources. The PicoBlaze microcontroller is extremely flexible. The basic functionality is easily extended and enhanced by connecting additional FPGA logic to the microcontroller’s input and output ports.

The PicoBlaze microcontroller provides abundant, flexible I/O at much lower cost than off-the-shelf controllers. Similarly, the PicoBlaze peripheral set can be customized to meet the specific features, function, and cost requirements of the target application. Because the PicoBlaze microcontroller is delivered as synthesizable VHDL source code, the core is future-proof and can be migrated to future FPGA architectures, effectively eliminating product obsolescence fears. Being integrated within the FPGA, the PicoBlaze microcontroller reduces board space, design cost, and inventory.

6. PicoBlaze Microcontroller Functional Blocks

PicoBlaze Embedded Microcontroller Block Diagram

➢ General-Purpose Registers:The PicoBlaze microcontroller includes 16 byte-wide general-purpose registers, designated as registers s0 through sF. For better program clarity, registers can be renamed using an assembler directive. All register operations are completely interchangeable; no registers are reserved for special tasks or have priority over any other register. There is no dedicated accumulator; each result is computed in a specified register.

➢ 1,024-Instruction Program Store:The PicoBlaze microcontroller executes up to 1,024 instructions from memory within the FPGA, typically from a single block RAM. Each PicoBlaze instruction is 18 bits wide. The instructions are compiled within the FPGA design and automatically loaded during the FPGA configuration process.

Other memory organizations are possible to accommodate more PicoBlaze controllers within a single FPGA or to enable interactive code updates without recompiling the FPGA design.

➢ Arithmetic Logic Unit (ALU):The byte-wide Arithmetic Logic Unit (ALU) performs all microcontroller calculations, including:

● basic arithmetic operations such as addition and subtraction● bitwise logic operations such as AND, OR, and XOR● arithmetic compare and bitwise test operations● comprehensive shift and rotate operations

All operations are performed using an operand provided by any specified register (sX).The result is returned to the same specified register (sX). If an instruction requires a second operand, then the second operand is either a second register (sY) or an 8-bit immediate constant (kk).

➢ Flags:ALU operations affect the ZERO and CARRY flags. The ZERO flag indicates when the result of the last operation resulted in zero. The CARRY flag indicates various conditions, depending on the last instruction executed.

The INTERRUPT_ENABLE flag enables the INTERRUPT input.

➢ 64-Byte Scratchpad RAM:The PicoBlaze microcontroller provides an internal general-purpose 64-byte scratchpad RAM, directly or indirectly addressable from the register file using the STORE and FETCH instructions.

The STORE instruction writes the contents of any of the 16 registers to any of the 64 RAM locations. The complementary FETCH instruction reads any of the 64 memory locations into any of the 16 registers. This allows a much greater number of variables to be held within the boundary of the processor and tends to reserve all of the I/O space for real inputs and output signals.

The six-bit scratchpad RAM address is specified either directly (ss) with an immediate constant, or indirectly using the contents of any of the 16 registers (sY). Only the lower six bits of the address are used; the address should not exceed the 00 - 3F range of the available memory.

➢ Input/Output:The Input/Output ports extend the PicoBlaze microcontroller’s capabilities and allow the microcontroller to connect to a custom peripheral set or to other FPGA logic. The PicoBlaze microcontroller supports up to 256 input ports and 256 output ports or a combination of input/output ports. The PORT_ID output provides the port address. During an INPUT operation, the PicoBlaze microcontroller reads data from the IN_PORT port to a specified register, sX. During an OUTPUT operation, the PicoBlaze microcontroller writes the contents of a specified register, sX, to the OUT_PORT port.

➢ Program Counter (PC):The Program Counter (PC) points to the next instruction to be executed. By default, the PC automatically increments to the next instruction location when executing an instruction. Only the JUMP, CALL, RETURN, and RETURNI instructions and the Interrupt and Reset Events modify the default behavior. The PC cannot be directly modified by the application code; computed jump instructions are not supported.

The 10-bit PC supports a maximum code space of 1,024 instructions (000 to 3FF hex). If the PC reaches the top of the memory at 3FF hex, it rolls over to location 000.

➢ Program Flow Control:The default execution sequence of the program can be modified using conditional and non-conditional program flow control instructions.

The JUMP instructions specify an absolute address anywhere in the 1,024-instruction program space.

CALL and RETURN instructions provide subroutine facilities for commonly used sections of code. A CALL instruction specifies the absolute start address of a subroutine, while the return address is automatically preserved on the CALL/RETURN stack.

If the interrupt input is enabled, an Interrupt Event also preserves the address of the preempted instruction on the CALL/RETURN stack while the PC is loaded with the interrupt vector, 3FF hex. Must be used the RETURNI instruction instead of the RETURN instruction to return from the interrupt service routine (ISR).

➢ CALL/RETURN Stack:The CALL/RETURN hardware stack stores up to 31 instruction addresses, enabling nested CALL sequences up to 31 levels deep. Since the stack is also used during an interrupt operation, at least one of these levels should be reserved when interrupts are enabled.

The stack is implemented as a separate cyclic buffer. When the stack is full, it overwrites the oldest value. Consequently, there are no instructions to control the stack or the stack pointer. No program memory is required for the stack.

➢ Interrupts:The PicoBlaze microcontroller has an optional INTERRUPT input, allowing the PicoBlaze microcontroller to handle asynchronous external events. In this context, “asynchronous” relates to interrupts occuring at any time during an instruction cycle. However, recommended design practice is to synchronize all inputs to the PicoBlaze controller using the clock input.

The PicoBlaze microcontroller responds to interrupts quickly in just five clock cycles.

➢ Reset:The PicoBlaze microcontroller is automatically reset immediately after the FPGA configuration process completes. After configuration, the RESET input forces the processor into the initial state. The PC is reset to address 0, the flags are cleared, interrupts are disabled, and the CALL/RETURN stack is reset.

The data registers and scratchpad RAM are not affected by Reset.

PicoBlaze Instruction Set

Processing Data :All data processing instructions operate on any of the 16 general-purpose registers. Only the data processing instructions modify the ZERO or CARRY flags as appropriate for the instruction. The data processing instructions consists of the following types:

● Logic instructions ● Arithmetic instructions ● Test and Compare instructions ● Shift and Rotate instructions

There are some examples about some of these instructions

Logic Instructions :The logic instructions perform a bitwise logical AND, OR, or XOR between two operands. The first operand is a register location. The second operand is either a register location or a literal constant. Besides performing pure AND, OR, and XOR operations, the logic instructions provide a means to:

● complement or invert a register ● clear a register ● set or clear specific bits within a register

➢ Bitwise AND, OR, XORAll logic instructions are bitwise operations. The AND operation, illustrated in the figure below, shows that corresponding bit locations in both operands are logically ANDed together and the result is placed back into register sX. If the resulting value in register sX is zero, then the ZERO flag is set. The CARRY flag is always cleared by a logic instruction.

Bitwise AND Instruction

The OR and XOR instructions are similar to the AND instruction illustrated in the figure above except that they perform an OR or XOR logical operation, respectively.

Complement/Invert Register:The PicoBlaze microcontroller does not have a specific instruction to invert individual bits within register sX. However, the XOR sX,FF instruction performs the equivalent operation, as shown in the figure below.

Complementing a Register Value

Invert or Toggle Bit:The PicoBlaze microcontroller does not have a specific instruction to invert or toggle an individual bit or bits within a specific register. However, the XOR instruction performs the equivalent operation. XORing register sX with a bit mask inverts or toggles specific bits, as shown in figure below. A ‘1’ in the bit mask inverts or toggles the corresponding bit in register sX. A ‘0’ in the bit mask leaves the corresponding bit unchanged.

Inverting an Individual Bit Location

Clear Register:The PicoBlaze microcontroller does not have a specific instruction to clear a specific register. However, the XOR sX,sX instruction performs the equivalent operation. XORing register sX with itself clears registers sX and sets the ZERO flag, as shown in the figure below.

Clearing a Register and Setting the ZERO Flag

The LOAD sX,00 instruction also clears register sX, but it does not affect the ZERO flag, as shown in the figure below.

Clearing a Register without Modifying the ZERO Flag

Set Bit:The PicoBlaze microcontroller does not have a specific instruction to set an individual bit or bits within a specific register. However, the OR instruction performs the equivalent operation. ORing register sX with a bit mask sets specific bits, as shown in the figure below. A ‘1’ in the bit mask sets the corresponding bit in register sX. A ‘0’ in the bit mask leaves the corresponding bit unchanged.

16-Setting a Bit Location

Clear Bit:The PicoBlaze microcontroller does not have a specific instruction to clear an individual bit or bits within a specific register. However, the AND instruction performs the equivalent operation. ANDing register sX with a bit mask clears specific bits, as shown in the figure below. A ‘0’ in the bit mask clears the corresponding bit in register sX. A ‘1’ in the bit mask leaves the corresponding bit unchanged.

Clearing a Bit Location

➢ Arithmetic InstructionsThe PicoBlaze microcontroller provides basic byte-wide addition and subtraction instructions. Combinations of instructions perform multi-byte arithmetic plus multiplication and division operations. If the end application requires significant arithmetic performance, consider using the 32-bit MicroBlaze RISC processor core for Xilinx FPGAs.

ADD and ADDCY Add InstructionsThe PicoBlaze microcontroller provides two add instructions, ADD and ADDCY, that compute the sum of two 8-bit operands, either without or with CARRY, respectively. The first operand is a register location. The second operand is either a register location or a literal constant. The resulting operation affects both the CARRY and ZERO flags. If the resulting sum is greater than 255, then the CARRY flag is set. If the resulting sum is either 0 or 256 (register sX is zero with CARRY set), then the ZERO flag is set.

The ADDCY instruction is an add operation with carry. If the CARRY flag is set, then ADDCY adds an additional one to the resulting sum.

The ADDCY instruction is commonly used in multi-byte addition. The next figure demonstrates a subroutine that adds two 16-bit integers and produces a 16-bit result. The upper byte of each 16-bit value is labeled as MSB for most-significant byte; the lower byte of each 16-bit value is labeled LSB for least-significant byte.

Increment/Decrement:The PicoBlaze microcontroller does not have a dedicated increment or decrement instruction. However, adding or subtracting one using the ADD or SUB instructions provides the equivalent operation, as shown in the next figure.

Incrementing and Decrementing a Register

If incrementing or decrementing a multi-register value—i.e., a 16-bit value—perform the operation using multiple instructions. Incrementing or decrementing a multi-byte value requires using the add or subtract instructions with carry, as shown in the next figure.

Incrementing a 16-bit Value

Negate:The PicoBlaze microcontroller does not have a dedicated instruction to negate a register value, taking the two’s complement. However, the instructions in the figure below provide the equivalent operation.

Destructive Negate (2’s Complement) Function Overwrites Original Value

Another possible implementation that does not overwrite the value appears in the next figure.

Non-destructive Negate Function Preserves Original Value

➢ No Operation (NOP)The PicoBlaze instruction set does not have a specific NOP instruction. Typically, a NOP instruction is completely benign, does not affect register contents or flags, and performs no operation other than requiring an instruction cycle to execute. A NOP instruction is therefore sometimes useful to balance code trees for more predictable execution timing. There are a few possible implementations of an equivalent NOP operation, as shown in the next figures. Loading a register with itself does not affect the register value or the status flags.

Loading a Register with Itself Acts as a NOP Instruction

A similar NOP technique is to simply jump to the next instruction, which is equivalent to the default program flow. The JUMP instruction consumes an instruction cycle (two clock cycles) without affecting register contents.

Alternative NOP Method Using JUMP Instructions

➢ Setting and Clearing CARRY FlagSometimes, application programs need to specifically set or clear the CARRY flag, as shown in the following examples.

Clear CARRY Flag:ANDing a register with itself clears the CARRY flag without affecting the register contents, as shown in the next figure.

ANDing a Register with Itself Clears the CARRY Flag

Set CARRY Flag:There are various methods for setting the CARRY flag, one of which appears in the figure below. Generally, these methods affect a register location.

Example Operation that Sets the CARRY Flag

7. Third Party Real Time Operating Systems (RTOS) Support

Although lacking a Memory Management Unit, and thus unable to run full Linux, several operating systems have been ported to the MicroBlaze including µClinux and FreeRTOS.

➢ Xilinx Microblaze Port on a Virtex-4 FPGA from FreeRTOS : This operating system is not included in Xilinx's Third Party Real Time Operating Systems (RTOS) Support.

The Microblaze port was developed using the PowerPC & MicroBlaze Virtex-4 FX12 Edition Development Kit. This is a very comprehensive kit that includes:

● An ML403 development board● All the required hardware development tools.● All the required software development tools (EDK and ISE).● A JTAG interface.● All the required cables.

The Microblaze port is intended to be as generic and widely applicable as possible. Therefore, even though the ML403 is a comprehensive development platform (including Ethernet, USB, audio, etc.), the FreeRTOS kernel places as little reliance as possible on hardware outside of the core FPGA component. The demo application that accompanies the port is configured to execute entirely from BRAM and only makes use of a basic UART component.

Downloading the demo application does not demonstrate the use of co-routines. Must be viewed the co-routine documentation page for information on how co-routine functionality can be quickly added to the demonstration.

Also, it is very IMPORTANT to viewed the Notes for using the Microblaze soft processor core RTOS port from link that points to Xilinx's web site.

Some details about the usable operating systems according Xilinx's web site are:

➢ PrKernel (µITRON4.0) from eSOL Co., Ltd: PrKERNELv4 is an embedded, realtime OS fully compliant with the µITRON4.0 standard profile. PrKERNELv4 is widely used in cellular phones, digital cameras, car navigation systems, printers, and FA equipment.

➢ RTA MB from from LiveDevices Ltd (ETAS Group): RTA-OSEK supports all variants of the Xilinx Microblaze family.

The RTA-OSEK Component for the Xilinx Microblaze was built using the following tools:● GNU/Xilinx mb-gcc GNU v3.4.1/Xilinx EDK v7.1.2 Build EDK_H.12.4● GNU/Xilinx mb-as GNU assembler v2.10.1/Xilinx EDK v7.1.1. Build EDK_H.11.3● GNU/Xilinx mb-ld.real GNU ld v2.10.1/Xilinx EDK v7.1.1 Build EDK_H.11.3

➢ uClinux from LynuxWorks: For the Xilinx MicroBlaze™ soft-core processor solution, LynuxWorks offers support for uClinux, again utilizing LynuxWorks embedded support and services group to help get MicroBlaze developers up and running fast.

Device Family Support: Virtex-4 LX and Spartan-3ESoftware Requirement: EDK Platform StudioRTOS support: uClinuxTool support: Compiler and DebuggerHardware platform used in validation

● Spartan-3E FPGAs - Xilinx SP3E1600E target single board computer (SBC)

➢ Nucleus® from Mentor Graphics ESD: The Mentor Graphics Nucleus PLUS RTOS supports both the Xilinx 32-bit MicroBlaze™ soft processor core as well as the Virtex™-4 and Virtex-II Pro embedded PowerPC™ processor.

Xilinx is including an evaluation version of the Nucleus PLUS LV RTOS port for MicroBlaze in one release of EDK which shipped to registered Xilinx customers.

Device Family Support:● Virtex-4 FX● Virtex-II Pro● Virtex-II● Spartan-3E● Spartan-3● Spartan-IIE

Software Requirement● Xilinx EDK

Key Features● RTOS support: Nucleus PLUS● Tool support● Nucleus NET TCP/IP protocol stack● Nucleus FILE file system● Nucleus SHELL terminal application● code|lab EDE embedded development environment● Hardware platform used in validation● Memec Design● Virtex-II MB 1000 system board● MEMEC Design Part # DS-KIT-V2MB1000● IP block used in validation: UART Lite, 10/100 EMAC

➢ µC/OS-II from Micriµm: µC/OS-II is a well-documented and reliable real-time operating system (RTOS) choice for the Xilinx MicroBlaze processor. µC/OS-II can be easily added to user's projects, and it can be configured from within Xilinx Platform Studio (XPS). The port files that adapt µC/OS-II to the MicroBlaze processor come with an application note (AN-1013), which provides a step-by-step explanation of how to use µC/OS-II with XPS projects. AN-1013 also includes a thorough description of the µC/OS-II MicroBlaze port. AN-1013.zip does not include µC/OS-II source files. These files must be obtained from Micrium.

➢ NORTi/ulTRON from MiSPO: Japanese operating system

➢ uCLinux from PetaLogix: PetaLogix is an embedded Linux® solutions provider founded by Dr John Williams, architect and maintainer of the port of the uClinux operating system to the Xilinx® MicroBlaze® soft processor.

PetaLogixTM aims to be the premier service provider for Linux on Reconfigurable Logic devices.

PetaLogixTM maintains strong links to the Embedded Systems Group at The University of Queensland, which initiated the MicroBlaze port of uClinux, and which continues to perform world-class research on reconfigurable computing.

There are thousands of device drivers in the Linux kernel.

➢ Sierra from Prevas AB: Prevas Sierra is a unique implementation of a real-time kernel in hardware and thanks to this, much faster than any software RTOS on the market according to it's datasheet.

The Prevas Sierra stand-alone RTOS supports both the IBM PowerPC and Xilinx MicroBlaze processors in Xilinx Virtex and Spartan Family FPGAs.

Device Family Support:● Virtex-4● Virtex-II Pro● Virtex-II● Virtex-E● Spartan-3E● Spartan-3● Spartan-IIE● Spartan-II

Xilinx Software Support:● Xilinx EDK v 7.1i and higher● Xilinx EDK v 6.3i● Xilinx EDK v 6.2i● Xilinx EDK v 6.1i

>> Multiprocessor support

Sierra Key Features:● Interfaces to Core Connect OPB and LMB busses● Works in parallel to the CPU● Scalable micro-kernel structure● Deterministic - 100 % time predictable● Plug-in architecture for adding of new system components● Full board support package with UDP, file system etc.

8. ARM Comparison

Xilinx sells a Virtex-II Pro FPGA with an embedded PowerPC 405 hard core, and the company also offers a 32-bit RISC soft core known as MicroBlaze. The hardened PowerPC 405 is much more powerful than a firm-core ARM7TDMI, but Actel’s ProASIC3 chips should be significantly less expensive.

Xilinx’s MicroBlaze soft core is similar to Altera’s Nios II, sharing many of the same advantages and disadvantages.

Altera's soft-core processor called Nios II, designed specifically for FPGA integration, and it runs at 140–180MHz in a fast Stratix or Stratix II device. Nios II is a 32-bit RISC processor with a deeper pipeline than the ARM7TDMI has (six stages vs. three), and it has additional advantages: dynamic or static branch prediction, configurable instruction and data caches, and an extendable instruction set. However, Nios II has two disadvantages: all instructions are 32 bits long (the ARM7TDMI has 16-bit Thumb instructions for greater code density), and it’s a proprietary Altera architecture, not the industry-standard ARM architecture.

The following topics would be interested for people who have to choose between Altera and Xilinx:

1. Support (How fast and good is it?)2. Third Party Tools (E.g. which OS-Systems do support Nios and/or Microblaze?

What’s the price for it?)3. Space (How much space does an implementation need in an FPGA?)4. Development Suites (What about the handling of the Software Suites from Altera

and Xilinx?)5. Speed (How fast are the implementations?)6. Period of vocational adjustment (How long does it take to do the first prosperous

work?)7. Needed Tools (Which tools are needed for an implementation? Are these tools

free?)

Microblaze is the most well known about the material, examples and operating systems can be found free of charge, but about Nios most are by money. From the other side Nios is most easier to implemented, but the Microblaze at the beginning it would be very difficult to get into it.

About the power consumption and performance from each one, is depending about the implementation, what external devices are connected, and what are used at all. Now about the power consumption is not clear at all, as for each one are offered tools to can be calculated for each case.

DEMETRIOU DEMETRIS 02872 - FIT...

Documents

Transcript of DEMETRIOU DEMETRIS 02872 - FIT...