XMOS vs. FPGA

2010-05-25 © 2010 XMOS Ltd www.xmos.com

A Programmable Revolution

A Compelling Alternative to Low Cost FPGAs

Introduction

FPGAs and CPLDs are used in many industries covering a broad range of performance requirements, price points and power envelopes.

In the early days, FPGAs were used for prototyping ASICs and for high-end, low volume applications that could bear a high unit cost, such as the communications and defence sectors. Since then, FPGA vendors have driven down costs and power, through rapid process migration, to produce new lower cost and lower power device families to address new requirements.

Evolution

In many cases low end FPGA families are now considered for power and cost sensitive consumer and industrial applications. These sectors benefit sufficiently from the flexibility and time to market advantages offered by FPGAs to warrant the price premium of programmability.

From 2009 onwards, new entrants to the programmable silicon market are starting to win the hearts and minds of designers looking for the best possible mix of solution flexibility, price and performance. Some new players, such as SiliconBlue and Achronix and tabula have come up with new FPGA architectures. In parallel, other vendors such as Actel and Cypress have integrated FPGA fabrics with programmable analogue blocks and microcontrollers.

All of these efforts represent an evolution of the same FPGA concept.

Revolution: Now for the first time there is an all-digital flexible solution that will prove to be a better, cheaper, easier and lower power solution than an FPGA for many applications—XMOS.

XMOS XS1-L

The XMOS XS1-L family of devices are based on the XMOS XCore® processor, a 500 MIPS event-driven RISC processor with 100% deterministic operation, a 32x32 multiplier, programmable I/O and a host of other resources, all programmable entirely in C++, C and XC. XC includes extensions to C for concurrency, communications, and timed I/O operations

XMOS devices can be used as a direct substitute for low cost SRAM based FPGAs. In other cases, they provide suitable replacements for some of the higher performance flash-based FPGAs.

Figure 1 shows the XS1-L family with respect to the price, capability (capacity and performance) and power consumption of various FPGA device families. For a large number of digital processing applications, the XS1-L outperforms Altera Cyclone III, Xilinx Spartan 3A FPGAS and equivalents on both price and power consumption.

Figure 1: XS1-L compared to popular FPGA

families

A single core XS1-L device offers a capacity for general digital logic implementation roughly comparable to an FPGA having 7-20K logic elements (roughly 70K-200K ASIC gates).

A Programmable Revolution – A Compelling Alternative to Low Cost FPGAs

XMOS, the XMOS logo and XCore are trademarks of XMOS Ltd

All other trademarks are the property of their respective owners.

There will always be a place for micropower programmable devices and high-end, DSP, bandwidth intensive FPGAs such as Virtex or Stratix parts. For applications residing in the space in between, however, XMOS can improve development speed and lower costs and power consumption without compromising solution flexibility and programmability.

In addition the XS1-L provides robust IP protection only found in flash-based FPGAs whilst retaining performance much closer to that of an SRAM FPGA.

The rest of this paper describes how XMOS technology delivers a revolution in both the programmable silicon itself and the associated hardware design processes.

The XCore Processor Instead of writing code in HDL to describe registers, gate and wires, designers who use XMOS technology, write code in C, C++ or XC to implement deterministic processing functions, as shown in Figure 2.

Figure 2: Designing with the XCore

Parallelism

An XCore processor runs multiple real-time hardware threads simultaneously. Each thread has access to a dedicated set of general purpose registers, gets a guaranteed share of the processing power, and executes a program using common RISC-style instructions. Each thread can execute simple computational code, DSP code, control software (taking logic decisions, or executing a state machine) or handle I/O operations using intelligent I/O resources.

The eight hardware threads, generous MIPS, 100% deterministic architecture and intelligent I/O provide designers with the flexibility of HDL, while dramatically easing the design entry and verification tasks.

Threads, Memory and Channels

Threads can use channels to provide buffered, event-based communication between threads, allowing data exchange and synchronization using single cycle instructions. Alternatively, threads can share 64KB of on-chip SRAM memory to exchange data, using single cycle lock instructions to co-ordinate access.

This makes the implementation of lightweight protocol stacks (such as TCP/IP microIP) that fit within the 64KB of memory essentially free when compared to an equivalent implementation in an FPGA, which requires a soft core such as Xilinx's MicroBlaze and an external memory interface that would consume a large portion of the FPGA capacity, not to mention adding an external memory chip to the BOM cost.

Task XMOS approach FPGA approach

Design Capture

High Level, parallel C/XC

code

HDL entry: always @(posedge

clock)

Resources instructions, threads,

channels, timers

Gates, LUTs, routing

DSP Threads, 32x32 MAC

HDL entry, Embedded Block

Multipliers

Table 1: FPGA Design Concepts and XMOS Equivalents

Time

Each XCore has ten configurable timers, which can be directly instantiated in XC and used to control program execution or I/O operations with nominal resolution of 10ns.

I/O and Interfacing

Each XCore provides up to 64 GPIO that can be set and sampled in a single instruction via intelligent, autonomous I/O resources called Ports. Simple input and output instructions transfer data to or from I/O ports, as shown in Figure 3. More complex use of ports allows data to be serialized and de-serialized, enabling the processor to keep up with high-speed data streams. The ports can timestamp data, synchronize transfers with an external or internal clock, and schedule data to be input or output at specific times.




out buffered port:1 outP = XS1_PORT_1B; in buffered port:4 inP = XS1_PORT_4A; clock ref = XS1_CLKBLK_REF; int main(void) { int value; configure_out_port_no_ready(outP, ref, 0); configure_in_port_no_ready(inP, ref); while (1) { inP :> value; if (value > 9) outP <: 1; else outP <: 0; }

Figure 3: XMOS Ports Use Example

Clock Blocks are used to select the internal XCore system clock, the timer reference clock, or an external clock connected via a 1-bit port to clock a given port. Clock blocks sample incoming external clocks and then provide a variety of conditioning options (for example, delaying the clock relative to the data associated with it).

Task XMOS approach FPGA approach

I/O Interfacing

Ports, timers HDL entry

Clocking Clock blocks Clock Management Units

Table 2: XMOS and FPGA I/O Concepts

Event-Driven Processing

The XCore processor is event-driven. Threads waiting for events do not consume any processing resources. An event can be the completion of a communication or I/O operation, the release of a lock, or a timer reaching a programmed time. Threads can wait for any one of a set of events; the first event causes the thread to start in a single instruction.

The XS1-L XCore provides an Active Energy Conservation mode in which it automatically and instantly slows the XCore clock down to a user-specified speed whenever all threads are paused. The clock returns to its normal speed as soon as any thread has new work to do.

Selecting your programmable

solution

Table 3 lists a range of application function examples and compares the utilization of XCore resources and FPGA logic elements required to implement the function.

XS1-L FPGA Asic

Fu

nctio

n

Th

read

s

MIP

S

Mem

ory

GPIO

Lo

gic

C

ells

Nan

d2

G

ate

s

USB2 + 2EP

5 400 30794 12 4400 44000

Ethernet MAC+MII

5 250 9982 14 3600 36000

TCP/IP (uip)

1 50 40000 0 61001 61000

S/PDIF 2 100 5036 2 800 8000

I2C Master

0.5 50 3044 2 700 7000

SDRAM Interface

(D8, A14)

1 100 2974 30 1100 11000

Table 3: Application Function Examples

IP Protection

Each XCore has 8KB of secure one time programmable (OTP) memory, secure execution mode, the ability to load AES encrypted firmware, and the option to disable JTAG and external channel access to a secured XCore. This all adds up to a level of IP protection that cannot be matched by an SRAM FPGA.

Applications requiring robust IP protection are often forced to use a slower but more secure flash-based FPGA, which can lead to timing closure issues. XMOS XS1-L devices offer a way to meet security and performance requirements with minimal effort.

1 Assumes a NIOS II and external memory interface is required for TCP/IP running in a Cyclone III device




DSP

XS1-L devices offer easily accessible DSP functionality via its 500 MHz 32x32 multiplier, offering a sustained rate (including load/store operations) of 59 MMACS per XCore (119 MMACS peak) which is sufficient for many audio, signal control and lower end DSP tasks that need low cost and power per MMAC.

The low cost FPGA families such as Altera Cyclone III, on the other hand, offer tens or hundreds of embedded block multipliers, which can be ganged together to create multipliers of arbitrary width. When many of these are employed in parallel, an aggregate DSP processing capability can be built up far in excess of what the XS1-L can achieve.

Consequently the FPGA provides a significant advantage for high throughput image, video processing or telecommunications infrastructure processing. For many emerging applications (such as consumer and prosumer digital audio), however, moderate DSP needs are just one item on the list of requirements alongside flexible control, low cost and integration. For these types of applications XMOS is likely to offer the ideal solution, all programmable in a high-level language.

Solution Scaling

An application that does not fit in a single XCore may be easily spread across multiple cores by selecting the two-core XS1-L2 device. Alternatively multiple XMOS devices can be connected together by asynchronous off-chip links that unify multiple XS1 processors into a single unified network mediated by communication via channels.

High I/O Capability

For applications that require many 100s of I/Os, a low cost FPGA is likely to be a preferable choice. Likewise for very high speed native I/O capabilities such as LVDS, gigabit SERDES transceivers, SSTL2 or other exotic I/O technology, choose an FPGA.

However a large majority of applications are well served with single ended 3.3V I/O, making large amounts of high speed I/O an expensive and unneeded feature.

Soft Processors

For FPGA designs that need to employ a soft processor to implement a protocol stack, the issue becomes the amount of code memory required. For many simple protocol stacks, such as TCP/IP for simple web-servers and various I/O related standard and proprietary protocols, the 64KB of internal SRAM on the XCore is sufficient.

In these cases the XS1-L is the cost-effective choice. To achieve the above in an FPGA would require either:

a gate hungry soft processor core and external memory interface plus external memory chip, all of which adds a sizeable penalty in device capacity, power consumption, I/O, BOM cost and board space.

A soft processor core with additional logic cells used to implement a small code memory on the FPGA.

Many soft processor implementations may also find it impossible to achieve the clock speed required to meet processing requirements, leaving the designer to look for a product that integrates hardened 32-bit RISC cores with a suitable programmable fabric.

For applications that have code footprints well in excess of 64KB, an FPGA with external memory may be the only option.

Figure 4: Costs associated with Soft Core Usage

in FPGAs




Design Flow

Figure 5 compares the standard FPGA design flow to the XMOS design flow. Overall, the XMOS design flow offers dramatically shorter iteration times and more straightforward design entry than the traditional FPGA flow.

Design Entry

Design entry is C++, C or XC using either the XDE graphical development environment or your favorite text editor. The XDE offers syntax highlighting, indenting and offers the ability to compile, launch simulations and

debugging.

Design in a High Level Language

EDA vendors have expended significant efforts to bring the advantages of high level languages to FPGA design, and still have a long way to go to deliver practical hardware design flows using C and high level

synthesis.

Designers using XMOS technology, on the other hand, immediately reap the productivity benefits of coding in a high level language, yet avoid the pitfalls of high level synthesis.

Ultra Fast Compilation

Even large XMOS programs compile and link in seconds compared to the minutes or even hours required to complete a typical iteration of FPGA synthesis and place and route.

Application Timing Closure

The XS1-L implements parallelism using its instruction set and native resources, all of which reliably run at 500 MHz. Designers using XMOS have no need to check register to register timing paths across multiple design corners.

One of the most powerful attractions of the

Figure 5: XMOS and FPGA Design Flows Compared




XMOS approach for FPGA designers is the ability to statically time paths through application code using the XMOS Timing Analyzer, which times critical application paths rather than register-to-register paths.

The Timing Analyzer achieves 100% coverage of enumerated constraints, unlike test-bench based simulation. For example, the Timing Analyzer can calculate the time in XCore cycles from a thread sampling a specific pattern on an input port to outputting a response on an output port. The result can be graphically displayed, highlighting the critical path through the code and automatically signing off against user specified timing constraints expressed as pragmas in the code or entered using the XTA GUI.

For FPGA designers to access similar functionality they must deploy property checkers and formal proof methods, which rapidly reach their limits on even moderately sized designs, and require specialist design knowledge to apply.

The Timing Analyzer offers a whole-application level timing capability that does not rely on time consuming dynamic simulation that will be appreciated by software and hardware engineers alike.

Simulation

Designers have the option to run XCore simulations of their code, visualizing the results with the XMOS VCD waveform viewer and debugging and single stepping with the debugger, all built into the XDE graphical environment.

The signals displayed in the VCD viewer are a range of actual signals that exist within the XS1-L silicon including program counters, port resource signals, timers, channels and thread status.

These simulations run an order of magnitude faster than a corresponding dynamic simulation in an event-driven HDL simulator. XSIM also provides a range of simple testbench plug-ins and an API for the user to create more of their own.

Bitstream Generation

After the design is ready, firmware for downloading to configuration flash memories are easily generated with XFLASH, which includes provision for multiple boot images and Dynamic Field Upgrade (DFU).

XBURN can be used to burn parts of the code image and selected user encryption keys to the 8KB of OTP on chip, or just set security options such as disabling JTAG debug access.

In System Debug

XMOS offers a typical processor debugging environment using XGDB (built on top of gdb, the GNU Debugger) and the XS1-L JTAG interface.

Debug iterations with XMOS tools only require a recompile and regeneration of firmware. FPGA designers must pre-select the nodes they wish to view and iterate through synthesis, place and route and timing analysis for each debug iteration.

PCB Design considerations

XMOS offers its processors in QFP, QFN and BGA packages, suitable for 2 layer and 4 layer BCB implementations.

In addition, the XS1-L parts require only two voltage supplies, a 3.3V or 2.5V supply for the I/O, and a 1V core voltage.

The various port/pin configurations that can be realized with the XS1-L also offer some late pin assignment flexibility although not to the same fine degree offered by FPGAs.

Toolchain Simplicity and Platform

Support

Full FPGA design tool chains from the FPGA vendors and/or third party EDA suppliers run to multiple gigabytes of data.

The XMOS tools typically only require about 200 megabytes and work out of the box on Windows, Linux and MAC platforms, allowing you to develop your applications on desktop PCs or notebooks.

Summary

XMOS offers a lower cost and more secure platform with dramatically enhanced time-to-market than traditional SRAM and FLASH based FPGAs for programmable digital logic designs in the 70K – 400K gate range

XMOS vs. FPGA

Documents

Transcript of XMOS vs. FPGA