QuickPlay: Software-Defined FPGA Platforms for … · QuickPlay: Software-Defined FPGA Platforms...

14
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 1 QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications QuickPlay is a unique design tool that allows software and hardware designers to develop and implement systems that include custom FPGA hardware, while doing no hardware design and requiring little to no hardware expertise. While QuickPlay has been streamlined for use by those without a hardware background, it can also dramatically improve a hardware engineer’s productivity. Many tools have attempted this goal in the past, without success, and that history means that systems engineers may well look more critically at tools, like QuickPlay, that promise such a capability. The purpose of this whitepaper is to explain how QuickPlay is different from what has come before, and why it is capable of achieving this elusive, yet coveted, goal. The discussion will start by defining the design problem that QuickPlay addresses and then examine the challenges of solving that problem, including critical areas where past attempts have come short of the promised goal. We then review the key characteristics that allow QuickPlay to be successful, followed by a high-level overview of how one designs a system using QuickPlay. The Case for FPGAs While CPUs are common, flexible, and familiar, CPU performance has struggled to keep up with the demands of increasingly complex algorithms and exploding volumes of data. Fueling the Internet of Things (IoT) and Big Data clearly requires new computing paradigms. On the other hand FPGAs provide unrivalled performance while maintaining a level of flexibility that software developers are used to. FPGAs allow hardware implementations that can be designed and redesigned without the expense of custom silicon. In addition, energy consumption has become a leading consideration, whether for controlling operating costs in a data center or for maintaining the life of a battery. FPGAs can provide targeted functions that achieve their performance using much less energy than a CPU would require. Finally, FPGAs provide a broad range of hardware options that are typically not available with a CPU. System developers using FPGAs can make more flexible decisions regarding I/O and memory, and have much more control over raw speed, overall bandwidth, and latency. QuickPlay leverages these resources, turning FPGAs into software-defined platforms that yield hardware benefits with no hardware design work.

Transcript of QuickPlay: Software-Defined FPGA Platforms for … · QuickPlay: Software-Defined FPGA Platforms...

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 1

QuickPlay: Software-Defined FPGA Platforms

for Hardware-Augmented Applications

QuickPlay is a unique design tool that allows software and hardware designers to develop and implement systems that include custom FPGA hardware, while doing no hardware design and requiring little to no hardware expertise. While QuickPlay has been streamlined for use by those without a hardware background, it can also dramatically improve a hardware engineer’s productivity.

Many tools have attempted this goal in the past, without success, and that history means that systems engineers may well look more critically at tools, like QuickPlay, that promise such a capability. The purpose of this whitepaper is to explain how QuickPlay is different from what has come before, and why it is capable of achieving this elusive, yet coveted, goal.

The discussion will start by defining the design problem that QuickPlay addresses and then examine the challenges of solving that problem, including critical areas where past attempts have come short of the promised goal. We then review the key characteristics that allow QuickPlay to be successful, followed by a high-level overview of how one designs a system using QuickPlay.

The Case for FPGAs While CPUs are common, flexible, and familiar, CPU performance has struggled to keep up with the demands of increasingly complex algorithms and exploding volumes of data. Fueling the Internet of Things (IoT) and Big Data clearly requires new computing paradigms.

On the other hand FPGAs provide unrivalled performance while maintaining a level of flexibility that software developers are used to. FPGAs allow hardware implementations that can be designed and redesigned without the expense of custom silicon.

In addition, energy consumption has become a leading consideration, whether for controlling operating costs in a data center or for maintaining the life of a battery. FPGAs can provide targeted functions that achieve their performance using much less energy than a CPU would require.

Finally, FPGAs provide a broad range of hardware options that are typically not available with a CPU. System developers using FPGAs can make more flexible decisions regarding I/O and memory, and have much more control over raw speed, overall bandwidth, and latency. QuickPlay leverages these resources, turning FPGAs into software-defined platforms that yield hardware benefits with no hardware design work.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 2

The Tension between Software and Hardware Many more systems could benefit from higher hardware content, and yet there is strong resistance to using hardware. The reason is that hardware design requires a very different skill set and thought process from software design.

Software designers focus on functionality: how is my data being manipulated? It is flow-oriented and algorithmic in nature. Hardware designers, by contrast, focus on structure. Which components are required? How should they be configured? How will they be interconnected and synchronized? This includes signals and busses and clocks and resets, which have no software counterparts.

The following table summarizes some of the key differences between software and hardware design.

Software Design Hardware Design

Programming

model

Functional models:

Control flow, data flow

Software languages specify a sequence of data manipulations.

Structural specification Hardware languages specify how specific components (CPU, memory, bus, DSP, engines, peripherals, etc.) should be assembled.

Abstraction High

Software languages abstract away the underlying execution engine

Low

Hardware languages define the execution engine (Boolean gates, registers, state machines, wires, clocks, resets)

Concurrency Sequential

Software naturally represents sequential operation. There’s limited syntactical or semantic support for concurrency.

Massively parallel

All specified structures execute concurrently.

Notion of time Untimed

Software languages have no concept of time. Timing is established by the execution engine.

Explicitly timed

Hardware timing is explicitly defined by the hardware designer through the instantiation of clocked registers.

Languages Universal

C, C++, Java, etc. Specialized

Verilog, VHDL, SDC, etc.

Design tools Ubiquitous

Easy-to-use open-source software tools are generally available for free

Specialized

Hardware tools tend to be proprietary, expensive, and complex.

Verification Easy

Software verification is done at the source level

Extremely difficult

Hardware verification involves low-level analysis, typically using electrical waveforms.

Conceptually, it’s hard to imagine software and hardware design being more different.

For this reason, hardware design is typically done by engineers with a different mindset and with different tools from those used by software engineers. This hardware expertise is scarcely

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 3

available and prevents many companies without hardware expertise to embark down the path of leveraging custom hardware, and companies with such expertise to expand the use of such custom hardware.

From a practical standpoint, hardware design departments tend to be organized away from software groups, and it’s a truism that the two operate in isolated silos with minimal communication. That’s improving as companies become more efficient, but the fact remains that including hardware design in a project has traditionally reflected significant extra effort, cost, and delay.

Design visionaries have long dreamt of a means whereby a software engineer could create hardware without transforming into a hardware designer. This requires a significant level of abstraction, and it is explicitly the problem that QuickPlay solves. QuickPlay is not the first attempt at developing such system design tool; it’s merely the first that has succeeded.

A History of Partial Solutions Because software engineers deal with functions that operate on data, it’s natural that their attempts to create hardware will have a similar focus. Specifically, the hardware they will want to create will, in theory, be functionally isomorphic with the original software function.

Let’s imagine an algorithm that can be broken into two functions, which we’ll call Function 1 and Function 2. From a software designer’s standpoint, this is simple and can be represented as shown in Figure 1.

Figure 1 - Functions to be performed on data

In a software version, a program would call Function1(), followed by a separate call to

Function2(). If we want to accelerate these 2 software functions in hardware, we need to create custom hardware from the software. If we want to automate this, then we need a tool that can make this transformation.

Such tools exist, and they’re called High-Level Synthesis (HLS) tools. They take parts of high-level C/C++ programs that have no hardware notions and turn them into hardware models in Hardware Description Language (HDL). Figure 2 below shows the two stand-alone hardware “Kernels” generated by an HLS tools from Function 1 and Function 2.

Figure 2 - Hardware kernels compiled with HLS tool

However, as the following discussion will show, HLS tools, despite being very efficient, provide only part of the solution. As we’ll see much more is needed beyond HLS to build custom hardware that can be used in a system.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 4

Let’s start with the data being processed. Where does it come from? Perhaps we will acquire data from an Ethernet network and, after processing, deliver it to a host CPU through a PCI Express link. Figure 1 above represents a complete picture of the system to the software engineer, and yet nothing in it provides any clue of how the data is acquired or sent off. So we need to add those to the picture. Figure 3 includes the hardware components necessary to read and write data to and from their respective input and output ports.

Figure 3 - Data I/O added to system.

In a realistic hardware system, each of these blocks needs control in order to function coherently. Such control is implicit in a system with a CPU, but it must be explicitly designed in a hardware system. Figure 4 includes that circuitry.

Figure 4 - Control block required for coherent operation

A real system may also have other peripherals – UART and LEDs and buttons and keyboards, and such. They must be accounted for in the hardware design, as shown in Figure 5.

Figure 5 - Peripherals and other such hardware will affect operation

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 5

These capture the obvious hardware elements needed to support the main functions. But underlying these is an important notion that is typically missing from the software domain: low-level timing and system startup. These involve clocks and reset signals, and they drive every block in the system. There may even be multiple clocks with different frequencies, each having its own “domain.” Managing those domains properly is critical to ensuring system stability and data integrity. Figure 6 shows the required infrastructure for that.

Figure 6 - Clock and reset signals are critical to all parts of the hardware.

Compare Figure 6 with Figure 2, and recall that traditional HLS tools cover only what’s in Figure 2. If we were to compare this to automobile design, it’s as if HLS tools create engines, which is useful, but much more is needed in order to design complete cars. As we can see here, HLS is beneficial, but far from sufficient.

Up to here, we’ve represented an abstract design process where the designer builds everything, including the board, from scratch. But, in practice, it’s common to make use of existing off-the-shelf boards to save time. That requires further work to map the design onto the resources of the selected board, generating the appropriate clocks and other board-related tasks. This work is simpler than designing a new board, but it is explicitly a hardware task that can be challenging for software engineers. It can still take many weeks for an experienced hardware designer to bring up a design on a new board.

Finally, no real design is ever completed without the need to debug errors. There are two domains within which these errors could occur. First, the original functions may not have been specified perfectly, and so they may require debug and iteration to correct problems. This is entirely within the scope of the software engineer’s skills. Second, the support hardware, if not designed carefully, could also contain bugs, and there would be no way for a software engineer to debug this.

Software and hardware debug are very different. In particular, hardware debug tools revolve around probing specific wires and observing the waveforms and timing to deduce where the bugs are. These tools and notions will not be familiar to the typical software engineer.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 6

Therefore, for a design methodology to be useful to a software engineer, any debug must involve only software that the designer has explicitly created, and this debug process must be achievable using standard software development methodologies and tools. No hardware debug – of the functions or of the support hardware – should ever be required.

In summary, if a tool enables software engineers to augment their applications with custom

hardware,

it needs the following characteristics:

Be readily usable for a software engineer who has no hardware expertise

Be able to create functional hardware from pure software code

Be able to incorporate existing hardware IP blocks if they are available

Be able to infer and create all of the support hardware - interfaces, control, clocks, etc.

Be able to support the use of commercial off-the-shelf boards and custom boards

The inferred support hardware must be correct by construction so that it requires no hardware

debug

Debug of functional blocks must be performed using standard software debug tools only,

with no hardware level debug

The ambitious dream of allowing software engineers to create hardware has remained an elusive, yet coveted, goal.

A Software-Centric Methodology The overall process of implementing a design using QuickPlay is straightforward. It consists of:

1. Developing a C/C++ functional dataflow model of the hardware engine 2. Verifying the functional model with standard C/C++ debug tools 3. Specifying the target FPGA boards and interfaces (PCIe, Ethernet, DDR, QDR, etc) 4. Compiling the HW engine

That’s all you need to get working hardware. However, in order for this simple process to work seamlessly, the generated hardware engine must be guaranteed to function identically to the original software model. Another way of stating this is that the functional model must be deterministic so that, no matter whether executed in software or in any possible hardware implementation, execution will give the same results, albeit at different speeds.

Unfortunately, most parallel systems suffer from non-deterministic execution. Multi-threaded software execution, for example, depends on the CPU, on the OS and on non-related processes running on the same host. Multiple runs of the same multi-threaded program can produce different behaviors. Such non-determinism in hardware would be a nightmare, as it would require debugging the hardware engine itself, at the electrical waveform level.

To eliminate this debug abstraction paradox, QuickPlay promotes an intuitive dataflow model that mathematically guarantees deterministic execution, regardless of the execution engine. Such model consists of concurrent functions, called “kernels”, communicating with streaming channels, which correlates well with how you might sketch your application on a whiteboard. In order to

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 7

guarantee deterministic behavior, these kernels must communicate with each other in a way that prevents data hazards, such as race conditions. This is achieved with streaming channels that are:

FIFO-based

Blocking read and non-blocking write

Point-to-point

Some of you may recognize these as the characteristics of a Kahn Process Network (KPN) – which is indeed the model of computation QuickPlay is built upon.

Figure 7 - Kahn Process Network example in QuickPlay.

The contents of any kernel can be arbitrary C/C++ code. Kernels can also be defined hierarchically, with one kernel containing a sub-network of kernels rather than code. Each kernel can then be defined as:

A C-function compiled to hardware through an HLS engine, whether the QuickPlay HLS engine or your FPGA vendor HLS engine, or

An existing piece of hardware IP, defined using a hardware description language, along with an accompanying C functional model.

QuickPlay then features a straightforward design flow, as shown in Figure 8 below.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 8

Figure 8 – QuickPlay compilation and execution Flow.

A Simple 6-step process The following steps describe the QuickPlay design flow in more detail.

Step 1: Pure software design

This is where you create your kernels, write C to define their functional behavior and connect them together with streams. QuickPlay Eclipse-based IDE provides a C/C++ library with simple APIs to:

Create kernels, streams, streaming ports and memory ports

Read and write to/from streaming ports and memory ports

In addition, the QuickPlay IDE provides an intuitive graphical editor that lets you program the way you think - visually. Figure 7 is a screenshot of a simple design built within QuickPlay.

Step 2: Functional verification

In this step, the focus is on making sure that the software model written in Step 1 works correctly. This is done by compiling the software model on your desktop, executing it with test inputs, and

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 9

verifying the correctness of the outputs. The software model is parallel, with a distinct thread for each kernel, but because QuickPlay KPN modeling provides deterministic execution, you don’t have to worry about the concurrency and can focus on the true functional bugs.

Debugging your software model is done with standard software debug techniques and tools – break points, watch points, step-by-step execution, printf, etc. You’ll probably be running more tests once it’s in hardware, which will likely uncover more bugs, but we’ll deal with that shortly.

From a design flow standpoint, this is where you do all of your verification. You will not need any further debugging at the hardware level.

It’s also important to remember that the functional model involves none of the infrastructure. In the example above, the focus is on the contents of Figure 1. None of the system aspects added in Figure 3 through Figure 6 – communication components, control plane, clocking & resets, etc. - are in play during this modeling and verification phase.

Step 3: Hardware generation

This is where you generate hardware from your software model. To do this, you:

Select which FPGA board to target. QuickPlay can implement designs on a growing selection of off-the-shelf boards. These boards typically feature leading edge Altera or Xilinx FPGAs, PCIe 3.0 link, 10Gb Ethernet, application specific interfaces, DDR3/4 SDRAM, QDR2+ SRAM, Flash memory and more. Selection is done through a simple menu in the QuickPlay tool.

Figure 9 – Off-the-shelf FPGA board

Map your input and output ports to the board’s physical interfaces. These are done through simple menu selections. Some of these interfaces can be:

o PCI-Express o TCP/IP, UDP over 10Gb Ethernet o HD SDI, HDMI, DisplayPort, SMPTE 2022 o DDR3 DDR4, QDR2+ , flash memory o …

Some of these protocols may require some minimal user information, like a MAC and IP address for a TCP/IP interface.

For every board supported in QuickPlay the corresponding interfaces are available for the user to select and use within its design.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 10

Selecting the communication protocol automatically invokes not only the hardware IP required to implement the connection, but also any software stacks layered over it so that the complete system is instantiated.

For FIFOs and memories, select whether to use FPGA internal SRAM or on-board external memory.

Push the “Build” button. This will compile software, run the HLS tool, create the system hardware, and run any other tools necessary to build the hardware images that the board will require. No manual intervention is required to complete this process.

Step 4: System execution

This is similar to the execution of the functional model in Step 2, except that now the application is running in hardware and software on a real system. This means that you can stream real data in, dramatically improving the verification coverage of your function. Because this will run so much faster, and because you can use live data sources, you are now able to run many more tests at this stage than you could during Step 2 – Functional verification - and therefore dramatically increase your test coverage.

Step 5: System debug

Because you’re running so many more tests now, you’re likely to uncover functional bugs that weren’t uncovered in Step 2. So now how do you debug those new bugs? As mentioned before, you never have to debug at the hardware level, even if the bug is discovered after executing a function in hardware. Because QuickPlay guaranties that the generated hardware is functionally equivalent to the software model, a bug discovered at this stage actually reflects a bug in the original algorithm. Therefore any bug in the hardware version has to exist in the software version as well. This is why you don’t need to debug in hardware; you can debug exclusively in the software domain.

But you do need to have a way to identify the test sequence that failed in hardware so that you can run that identical test sequence on the software functional model. QuickPlay captures the hardware tests as they run and can then import any test back into the software environment where you actually do your debug.

This is possible because the hardware system is automatically provisioned with infrastructure for observing all of the critical points of the design. Figure 10 shows the system of Figure 6 with added debug circuitry. Without QuickPlay, some sort of debug infrastructure would have to be inserted and managed by hand; with QuickPlay, this all becomes transparent.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 11

Figure 10 - Debug infrastructure is automatically created.

The overall process, then, as illustrated in Figure 11, is to model in software, then build the system and test in hardware. If there are any bugs, import the failing test sequences back into the software environment, debug there, fix the source code, and then repeat the process. This represents a dramatic productivity improvement over traditional flows.

Figure 11 - Debug happens only in software domain

Step 6 (Optional): System optimization

Once you have completed the debug phase, you are done: your system is complete. However, you may want to make some performance optimizations, and this is the time to do that – when you know that your system is running correctly.

The first optimization you should consider is to refine your functional model. There are probably additional concurrency opportunities available, for example, so you may try decomposing or

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 12

refactoring the functions a different way. At this level, optimizations can yield spectacular performance improvements.

Second, you may want to try a different FPGA board. Because the mapping from the functional model to the board is so easy, it’s very simple to try a variety of boards to select the optimal one.

The third optimization has to do with the hardware kernels that QuickPlay creates via HLS. While the resulting hardware is guaranteed to operate correctly and efficiently, it may not operate as efficiently as hardware hand-crafted by a hardware engineer. So at this stage, you have several options:

Optimize your code and tune QuickPlay HLS settings to improve the generated hardware.

Choose a 3rd party HLS tool to generate more efficient hardware.

Have a hardware team hand-craft the most critical blocks.

None of these steps is required, but they provide options when you need better hardware but have limited hardware design resources available. A hardware engineer may be able to help with these optimizations. Once any of these changes is made, the build process is simply repeated.

A Universal Streaming Conduit QuickPlay enables rapid design of hardware-augmented applications, with broad software architecture flexibility. It is based on a data-flow model of computation where data moves through streaming channels that can have many different physical incarnations:

Streaming Type Physical Media

Kernel to Kernel FPGA fabric

Kernel to FPGA SRAM memory FPGA fabric

Kernel to DDR memory DDR link

Kernel to QDR memory QDR link

Kernel to embedded CPU FPGA fabric

Kernel to external CPU PCIe link TCP/IP Ethernet network

QuickPlay provides a universal streaming API that entirely abstracts away the underlying physical communication protocol. Streaming data is received via the ReadStream() function; streaming data is sent on using the WriteStream() function. These functions can be used to send and receive data between kernels, to embedded or board-level memory, and to embedded or external host CPU, thus providing broad architectural flexibility with no need to comprehend or manage the underlying low-level protocols.

The hardware through which that data arrives and departs is determined by the selected protocol. Selecting the desired protocol sets up not only the hardware needed to implement the protocol, but also the software stacks required to support the higher protocol layers, as shown in Figure 12 below.

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 13

Figure 12 - Universal Streaming API

The exact implementation of these reads and writes – size, alignments, marshaling, etc. – are managed by QuickPlay. The most important characteristic of the ReadStream and WriteStream statements is that they’re blocking: when either statement is encountered, execution will not pass to the next statement until all of the expected data has been read or written. This is important for realizing the determinism of the algorithm.

The “binding” between the generic ReadStream and WriteStream statements and the actual underlying protocol hardware occurs at run time via the QuickPlay Library. Not only does this keep the communication details from cluttering up the software program, it also provides modularity and portability. The communication protocol can easily be changed without requiring any changes to the actual kernel code or host software. The ReadStream and WriteStream statements will automatically bind to whichever protocol has been selected with no effect on program semantics.

As a result of the abstraction that QuickPlay provides, the software algorithms remain pure, focusing solely on data manipulation in a manner that’s completely independent of the underlying communication details.

Quick to learn; production quality; first to market The learning curve to use QuickPlay is modest. Building KPN models may take a little study, but it should be intuitive for most users since it is based upon the natural functional representation of computing systems.

Depending on the HLS tool being used, results might be improved by learning coding styles that result in more efficient hardware generation, but that is optional. Any design can be done without code restructuring, and many designs won’t require it at all.

Whether you use an off-shelve board or a custom board, the systems you create using QuickPlay are production-worthy. What that means is that QuickPlay is the fastest way to get from a system idea to a hardware-augmented application. A process that would normally take months is reduced to days, and you can be deploying and shipping to your customers faster than you imagined possible.

All of this makes QuickPlay a unique tool that achieves the long-sought goal of allowing software engineers and hardware engineers to implement systems based on custom FPGA hardware and be production-ready months ahead of fully handcrafted designs. By working in their familiar domain,

QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015 www.quickplay.io 14

software engineers can make use of custom hardware as needed, automatically generating hardware-augmented applications; By working with a higher level of abstraction, hardware engineers can benefit from the automatically generated optimized hardware infrastructure, while focusing their unique expertise on a select few key components of the system.

In summary then, QuickPlay uniquely provides a methodology that:

Involves only software tools and techniques, requiring little to no hardware expertise

Is capable of creating a hardware implementation from pure, untimed software

Can integrate functional hardware IP blocks if desired

Can infer and create all necessary supporting hardware infrastructure

Supports a growing ecosystem of FPGA boards

Creates correct-by-construction infrastructure that never needs debugging

Allows functional debug purely in the software domain, with no hardware level debug required

About QuickPlay QuickPlay is an initiative of PLDA GROUP, a privately-owned, self-funded technology group that serves the embedded electronics industry since 1996 by providing a broad range of leading edge products and services to over 5,000 companies worldwide. QuickPlay embodies our long time vision that FPGA computing should be as approachable as ubiquitous CPU computing. QuickPlay is the result of years of research in the field of High-Level Design (HLD) and High-Level Synthesis (HLS) tightly coupled with a strong expertise in FPGA hardware and IP design. QuickPlay is proof that innovations happen when talents from different engineering perspectives are brought together to work on a common cause.

PLDA GROUP has R&D offices in France, Italy, Bulgaria, and USA.

More info at: www.quickplay.io