Elec327b Dsp Processors 1

DSP Hardware

DSP PROCESSORS AND DSP IMPLEMENTATION - 1

Introduction

General and special purpose DSP processors

Computer architectures for signal processing

General purpose fixed point DSP processors Selecting DSP Processors

Implementation of DSP algorithms

Special purpose DSP processors

Summary and Problems

Professor E C Ifeachor

11 March, 2003.

1. Introduction

DSP processors are used to implement and execute DSP algorithms in real-time (often real-time implies 'as soon as possible', but within specified time limits).

The main objectives of this section of the DSP course (lectures session and associated laboratory/course work) are to provide an understanding of

(1) Key issues underlying DSP processors and their hardware/software architectures.

(2) How DSP algorithms are implemented for real-time execution using fixed point DSP processors (digital

filtering will be used as a vehicle for this).

(3) Finite word length effects in fixed point DSP systems (using digital filtering as a vehicle for in the

discussions).

2. General and special purpose DSP processors

For convenience, DSP processors can be divided into two broad categories:

(1) General purpose DSP processors these are basically high speed microprocessors with hardware and

instruction sets optimized for DSP operations. Examples of such processors include fixed-point devices such as Texas Instruments TMS320C54x and Motorola DSP563x processors, and floating point processors such as Texas Instruments TMS320C4x and Analog Devices ADSP21xxx SHARC processors.

(2) Special purpose DSP processors these include: (i) hardware designed for efficient execution of

specific DSP algorithms and (some times called algorithm-specific hardware), e.g. FFT, and (ii) hardware designed for specific applications (some times called application specific processors), e.g. for PCM in telecommunications or audio applications. Examples of special-purpose DSP processors are Cirrus's processor for digital audio sampling rate converters (CS8420), Mitel's multi-channel telephony voice echo canceller (MT9300), FFT processor (PDSP16515A) and programmable FIR filter (VPDSP16256).

3.Computer architectures for signal processing

Standard microprocessors are based on the von Neumann concepts where operations are performed sequentially. Increase in processor speed is only achieved by making the individual units of the processor operate faster, but there is a limit to this (see Figure 1). For real-time operation, DSP processors must have architecture optimised for executing DSP operations. Figure 1b depicts a generic hardware architecture for DSP.

Figure 1 A simplified architecture for standard microprocessors

Figure 2 A simplified generic hardware architecture for DSP

The characteristic features of the architecture of Figure 2 include:

Multiple bus structure, with separate memory spaces for data and programs.

Arithmetic units for logical and arithmetic operations, include a hardware multiplier/accumulator.

Why is such an architecture necessary? In DSP most algorithms, e.g. digital filtering and FFT, involve repetitive arithmetic operations such as multiplication, additions, memory accesses and heavy data flow through the CPU.

The architecture of standard microprocessors is not suited to this type of activity. An important goal in DSP hardware design is to optimise both hardware architecture and instruction set to increase speed and make real time execution possible whilst keeping quantization errors low. In DSP, this is achieved by making extensive use of the concepts of parallelism. In particular, the following techniques are used:

Harvard architecture

Pipelining

Fast, dedicated hardware multiplier/accumulator

Specialised instructions dedicated to DSP

Replication

On-chip memory/cache.

Extended parallelism SIMD, VLIW and static super scalar processing.

We will examine some of the above techniques to gain more understanding of the architectural features of DSP processors.

3.1Harvard architecture

In a standard microprocessor, the program codes and the data are held in one memory space. Thus, the fetching of the next instruction while the current one is executing is not allowed, because the fetch and execution phases each require memory access (see Figure 3).

Figure 3 An illustration of instruction fetch, decode and execute in a non-Harvard architecture with single memory space (a) instruction fetch from memory; (b) timing diagram

NB: The example illustrates reading of a value op 1 at address ADR1 in memory into the accumulator and then storing it at two other addresses, ADR2 and ADR3. The instructions could be:

LDAADR1Load the operand op1 into the accumulator from ADR1

STAADR2Store op1 in address ADR2

STAADR3Store op1 in address ADR3

Typically, an instruction in a microprocessor involves three distinct steps:

Instruction fetch

Instruction decode

Instruction execute.

The main feature of the Harvard architecture is that the program and data memories lie in two separate spaces, see Figure 4. This permits a full overlap of instruction fetch and execution.

Figure 4 The basic Harvard architecture with separate data and program spaces;

Figure 5 An illustration of instruction overlap made possible by Harvard architecture.

In a Harvard architecture, since the program codes and data lie in separate memory spaces, the fetching of the next instruction can overlap the execution of the current instruction. Normally, the program memory holds the program codes, whilst the data memory stores variables such as the input data samples.

3.2Pipelining

This is a technique used extensively in DSP to increase speed as it allows two or more operations to overlap during execution. In pipelining, a task is broken down into a number of distinct sub-tasks which are then over lapped during execution.

A pipeline is akin to a typical production line in a factory, such as a car or TV assembly plant. As in the production line, the task is broken down into small, independent sub-tasks called pipe stages which are connected in series to form a pipe. Execution is sequential.

Figure 6 An illustration of the concepts of pipelining.

Figure 6 gives a timing diagram of a 3-stage pipeline. Typically, each step in the pipeline takes one machine cycle to complete. Thus, during a given cycle up to three different instructions may be active at the same time, although each will be at a different stage of completion.

The speedup = average instruction time (non pipeline)

(1)

average instruction time (pipeline)

Example 1

In a non pipeline processor, the instruction fetch, decode and execute take 35 ns, 25 ns, and 40 ns, respectively. Determine the increase in throughput if the instruction steps were pipelined. Assume a 5 ns pipeline overhead at each stage, and ignore other delays.

Solution

In an ideal non pipeline processor, the average instruction time is simply the sum of the times for instruction fetch, decode and execute:

35 + 25 + 40 ns = 100 ns.

However, if we assume a fixed machine cycle then each instruction time would take three machine cycles to complete: 40 ns x 3 = 120 ns (the execute time maximum time determines the cycle time). This corresponds to a throughput of instructions per second.

In the pipeline processor, the clock speed is determined by the speed of the slowest stage plus overheads, i.e. 40 + 5 = 45 ns. The through put (when the pipeline is full) is instructions per second.

Speed up =

average instruction time (non pipeline) = 120/45 = 2.67

average instruction time (pipeline)

Pipelining has a major impact on the system memory because it leads to an increased number of memory accesses (typically by the number of stages). The use of Harvard architecture where data and instructions lie in separate memory spaces promotes pipelining.

Drill

Assuming the times in the above example are as follows:

fetch

-20 nS

decode

-25 nS

execute

-15 ns

overhead-1 nS

Determine the increase in throughput if the instructions were pipelined.

Solution

Example 2

Most DSP algorithms are characterised by multiply-and-accumulate operations typified by the following equation:

Figure 5 shows a non pipeline configuration for an arithmetic element for executing the above equation. Assume a transport delay of 200 ns, 100ns and 100 ns, respectively for the memory, multiplier and the accumulator.

(1) What is the system throughput?

(2) Reconfigure the system with pipelining to give a speed increase of 2:1. Illustrate the operation of the new configuration with a timing diagram.

Figure 7 Non-pipelined MAC configuration.

Solution

(1) The coefficients, , and the data arrays are stored in memory as shown in Figure 7. In the non-pipelined mode, the coefficients and data are accessed sequentially and applied to the multiplier. The products are summed in the accumulator. Successive MAC will be performed once every 400 ns (200 + 100 + 100), that is a throughput of operations per second.

(2) The arithmetic operations involved can be broken up into three distinct steps: memory read, multiply, and accumulate. To improve speed these steps can be overlapped. A speed improvement of 2:1 can be achieved by inserting pipeline registers between the memory and multiplier and between the multiplier and accumulator as shown in Figure 8. The timing diagram for the pipeline configuration is shown in Figure 9. As is evident in the timing diagram, the MAC is performed once every 200 ns. The limiting factor is the basic transport delay through the slowest element, in this case the memory. Pipeline overheads have been ignored.

Figure 8 Pipelined MAC configuration. The pipeline registers serve as temporary store for coefficient and data sample pair. The product register also serves as a temporary store for the product.

Figure 9 Timing diagram for a pipelined MAC unit. When the pipeline is full, a MAC operation is performed every clock cycle (200 ns).

DSP algorithms are often repetitive but highly structured, making them well suited to multilevel pipelining. Pipelining ensures a steady flow of instructions to the CPU, and in general leads to a significant increase in system through put. However, on occasions pipelining may caused problems (e.g. an unwanted instruction execution, especially near branch instructions).

3.3Multiplier/Accumulator

The basic numerical operations in DSP are multiplication and addition. Multiplication in software is time consuming. Additions are even worse if floating point arithmetic is used.

To make real-time DSP possible, a fast dedicated hardware MAC, using either fixed point or floating point arithmetic is mandatory. Characteristics of a typical fixed point MAC include:

16 x 16 bit 2's complement inputs

16 x 16 bit multiplier with 32-bit product in 25 ns

32/40 bit accumulator

3.4Special instructions

These are instructions optimised for DSP and lead to compact codes and increased speed of execution of operations that are repeated. For example, digital filtering requires data shifts or delays to make room for new data, followed by multiplication of the data samples by the filter coefficients, and then accumulation of products. Recall that FIR filters are characterised by the following equation:

, where N is the filter length.

In the TMS320C50, for example, the FIR equation can be efficiently implemented using the instruction pair:

RPT

NM1

MACD

HNM1, XNM1

The first instruction, RPT NM1, loads the filter length minus 1 (N-1) into the repeat instruction counter, and causes the multiply-accumulate with data move (MACD) instruction following it to be repeated N times. The MACD instruction performs a number of operations in one cycle:

(1) multiplies the data sample, , in the data memory by the coefficient, , in the program memory;

(2) adds previous product to the accumulator;

(3) implements the unit delay, symbolized by , by shifting the data sample, x(n-k), up to update the tapped delay line.

In the Motorola DSP56000 DSP processor family, as in the TMS320 family, the MAC instruction, together with the repeat instruction (REP) may be used to implement an FIR filter efficiently:

REP

#N-1

MAC

X0, Y0, AX: (R0)+, X0 Y: (R4)+, Y0

Here the repeat instruction is used with the MAC instruction to perform sustained multiplication and sums of product operations. Again, notice the ability to perform multiple operations with one instruction, made possible by having multiple data paths.

The contents of the registers X0 and Y0 are multiplied together and the product added to the accumulator. At the same time, the next data sample and corresponding coefficient are fetched from the X and Y memories for multiplication.

In most modern DSP processors, the concept of instruction repeat has been taken further by providing instructions that allow a block of code, not just a single instruction, to be repeated a specified number of times. In the TMS320 family (e.g. TMS320C50, TMS320C54 and TMS320C30), the format for repeat execution of a block of instructions, with a zero-overhead loop, is:

RPTBloop

::

loop(last instruction)

Repeat instructions provided by some DSP processors have high level language features. In Motorola DSP56000 and DSP56300 families zero-overhead DO loops are provided which may also be nested. The example below illustrates a nested Do loop in which the outer loop is executed N times and the inner loop NM times.

DO #N, LOOP1

:

DO #M, LOOP2

:

LOOP2 (last instruction is placed here)

:

LOOP1 (last instruction in the outer loop is placed here)

Nested loops are useful for efficient implementation of DSP functions such as FFT algorithms and 2-D dimensional signal processing.

Analog Devices DSP processors (e.g. ADSP-2115 and SHARC processors) also have nested-looping capability. The ADSP-2115 supports up to 4 levels of nested loops. The format for looping is:

CNTR= N

DO LOOP UNTIL CE

:

:

LOOP:(last instruction in the loop)

The loop is repeated until the counter expires. The loop can contain a large block of instructions, not just a single instruction. The format for nested looping is essentially the same as for DSP56000 family.

Modern DSP processors also feature application-oriented instructions for applications such as speech coding (e.g. those for codebook search), digital audio (e.g. those for surround sound ) and telecommunications (e.g. those for Viterbi decoding). Other application oriented instructions include those that support coefficient update for adaptive filters and bit reverse addressing for FFTs (see later).

3.5Extended parallelism - SIMD, VLIW and static superscaler processing.

The trend in DSP processor architecture design is to increase both the number of instructions executed in each cycle and the number of operations performed per instruction to enhance performance. In newer DSP processor architectures, parallel processing techniques are extensively used to achieve increased computational performance. The three techniques that are used, often in combination, are:

Single instruction, multiple data (SIMD) processing.

Very-long-instruction-word (VLIW) processing Superscalar processing

Figure 10 An illustration of the use of SIMD processing and multiple data size capability to extend the number of multiplier/accumulators (MACs) from one to four in a TigerSHARC DSP processor.

Note: SIMD processing is used to increase the number of operations performed per instruction. Typically, in DSP processors with SIMD architectures the processor has multiple data paths and multiple execution units. Thus, a single instruction may be issued to the multiple execution units to process blocks of data simultaneously and in this way the number of operations performed in one cycle is increased.

Figure 11 Principles of very long instruction word (VLIW) architecture and data flow in the advanced, fixed point DSP processor, TMS320C62x.

Note: The Very-long-instruction-word processing is an important approach for substantially increasing the number of instructions that are processed per cycle. A very-long-instruction word is essentially a concatenation of several short instructions and require multiple execution units, running in parallel, to carry out the instructions in a single cycle. In the TMS320C62x, the CPU contains two data paths and eight independent execution units, organised in two sets - (L1, S1, M1and D1) and (L2, S2, M2 and D2). In this case, each short instruction is 32-bits wide and eight of these are linked together to form a very long instruction word packet which may be executed in parallel. The VLIW processing starts when the CPU fetches an instruction packet (eight 32-bit instructions) from the on-chip program memory. The eight instructions in the fetch packet are formed into an execute packet, if they can be executed in parallel, and then dispatched to the eight execution units as appropriate. The next 256-bit instruction packet is fetched from the program memory while the execute packet is decoded and executed. If the eight instructions in a fetch packet are not executable in parallel, then several execute packets will be formed and dispatched to the execution units, one at a time. A fetch packet is always 256-bit wide (eight instructions), but an execute packet may vary between 1 and 8 instructions.

Figure 12 Principles of superscalar architecture and data flow in the

TigerSHARC DSP processor

Note: Superscalar processing is used to increase the instruction rate of a DSP processor by exploiting instruction-level parallelism. Traditionally, the term superscalar refers to computer architectures that enable multiple instructions to be executed in one cycle. Such architectures are widely used in general purpose processors, such as PowerPC and Pentium processors. In superscalar DSP processors, multiple execution units are provided and several instructions may be issued to the units for concurrent execution. Extensive use is also made of pipelining techniques to increase performance further. The TigerSHARC is described as a static superscalar DSP processor because parallelism in the instructions is determined before run-time. In fact, the TigerSHARC processor combines SIMD, VLIW and superscalar concepts. This advanced, DSP processor has multiple data paths and two sets of independent execution units, each with a multiplier, ALU, a 64-bit shifter and a register file. TigeSHARC is a floating point processor, but it supports fixed arithmetic with multiple data types (8-, 16-, and 32-bit numbers). The instruction width is not fixed in the TigerSHARC processor. In each cycle, up to four 32-bit instructions are fetched from the internal program memory and issued to the two sets of execution units in parallel. An instruction may be issued to both units in parallel (SIMD instructions) or to each execution unit independently. Each execution unit (ALU, multiplier or shifter) takes its inputs from and returns its results to the register file. The register files are connected to the three data paths and so can simultaneously read two inputs and write an output to memory in a cycle. This load/store architecture is suited to basic DSP operations which often take two inputs and computes an output. Because the processor can work on several data sizes, the execution units allow further levels of parallel computation. Thus, in each cycle the TigerSHARC can execute up to eight addition/subtract operations and eight multiply-accumulate operations with 16-bit inputs, in stead of two multiply-accumulate operations with 32-bit inputs.

4.General purpose fixed point DSP processors

General-purpose DSP processors have evolved substantially over the last decade as a result of the never-ending quest to find better ways to perform DSP operations, in terms of computational efficiency, ease of implementation, cost, power consumption, size, and application-specific needs. The insatiable appetite for improved computational efficiency has led to substantial reductions in instruction cycle times and, more importantly, to increasing sophistication in the hardware and software architectures. It is now common to have dedicated, on-chip arithmetic hardware units (e.g. to support fast multiply/accumulate operations), large on-chip memory with multiple access and special instructions for efficient execution of inner core computations in DSP. We have also seen a trend towards increased data word sizes (e.g. to maintain signal quality) and increased parallelism (to increase both the number of instructions executed in one cycle and the number of operations performed per instruction. Thus, we find that in newer general purpose DSP processors increasing use is made of multiple data paths/arithmetic to support parallel operations. DSP processors based on SIMD, VLIW and superscalar architectures are being introduced to support efficient parallel processing. In some DSP processors, performance is enhanced further by using specialised, on-chip co-processors to speed up specific DSP algorithms such as FIR filtering and Viterbi decoding. The explosive growth in communications and digital audio technologies have had a major influence in the evolution of DSP processors, as has growth in embedded DSP processor applications.

A summary of key features of four generations of fixed-point DSP processors from four leading semiconductor manufacturers is given in Table 1. The classification of DSP processors into the four generations is partly based on historical reasons, architectural features and computational performance.

The basic architecture of the first generation fixed point DSP processor family (TMS320C1x), first introduced in 1982 by Texas Instruments, is depicted in Figure 13. A typical second generation DSP processor is depicted in Figures 14 (Motorola DSP5600x).

Figure 13 A simplified architecture of a first generation fixed point DSP processor (Texas Instruments TMS320C10)

Figure 14 A simplified architecture of a second generation fixed point DSP

(Motorola DSP56000).

Third generation fixed point DSP processors are essentially enhancements of second generation DSP processors. Compared to the second generation DSP processors, features of the third generation DSP processors include more data paths (typically three compared to two in the second generation), wider data paths, larger on-chip memory and instruction cache and in some cases a dual MAC. As a result, the third generation DSP processors have performance that are typically 2 or 3 times superior to that of the second generation DSP processors of the same family. Simplified architectures of two third generation DSP processors are depicted in Figure 14 (TMS320C54x) and Figure 15 (DSP563x). Most of the third generation fixed-point DSP processors are aimed at applications in digital communications and digital audio, reflecting the enormous growth and influence of these application areas on DSP processor development. Thus, we find features in some of the processors that support these applications.

The TMS320C54x, for example, includes special instructions for adaptive filtering (which is often used for echo cancellation and adaptive equalisation in telecommunications) and to support Viterbi decoding. In the third generation processors, semiconductor manufacturers have also take the issue of power consumption seriously (because of its importance in portable and hand held devices such as the mobile phone). Most of the third generation DSP processors are low power and have power management facility.

Fourth generation fixed point DSP processors with their new architectures are primarily aimed at large and/or emerging multichannel applications, such as digital subscriber loops, remote access server modems, wireless base stations, third generation mobile systems and medical imaging. The new fixed point architecture that has attracted a great deal of attention in the DSP community is the very long instruction word (VLIW). The new architecture makes extensive use of parallelism whilst retaining some of the good features of previous DSP processors. Compared to previous generations, fourth generation fixed point DSP processors, in general, have wider instruction words, wider data paths, more registers, larger instruction cache and multiple arithmetic units, enabling them to execute many more instructions and operations per cycle. The texas Instruments TMS320C62x family of fixed point DSP processors is based on the VLIW architecture. The core processor has two independent arithmetic paths, each with four execution units a logic unit (Li), a shifter/logic unit (Si), a multiplier (Mi) and a data address unit (Di). Typically, the core processor fetches eight 32-bit instructions at a time, giving an instruction width of 256 bits (and hence the term very long instruction word). With a total of eight execution units, four in each data path, the TMS320C62x can execute up to eight instructions in parallel in one cycle. The processor has a large program and data cache memories (typically, 4 Kbyte of level 1 program/data caches and 64 Kbyte of level 2 program/data cache). Each data path has its own register file (sixteen 32-bit registers), but can also access registers on the other data path. Advantages of VLIW architectures include simplicity and high computational performance. Disadvantages include increased program memory usage (organisation of codes to match the inherent parallelism of the processor may lead to inefficient use of memory). Further, optimum processor performance can only be achieved when all the execution units are busy which is not always possible because of data dependencies, instruction delays and restrictions in the use of the execution units. However, sophisticated programming tools are available for code packing, instruction scheduling, resource assignment and in general to exploit the vast potential of the processor.

5.Floating-point DSP processors.

The ability of DSP processors to perform high speed, high precision DSP operations using floating point arithmetic has been a welcome development. This minimises finite word length effects such as overflows, round off errors, and coefficient quantization errors inherent in DSP. It also facilitates algorithm development, as a designer can develop an algorithm on a large computer in a high level language and then port it to a DSP device more readily than with fixed point.

Floating point DSP processors retain key features of fixed point processors such as special instructions for DSP operations and multiple data paths for multiple operations. As in the case of fixed point DSP processors, floating point DSP processors available are significantly different architecturally. Some of the key features of the three generations of floating point DSP processors from Texas Instruments and Analog Devices are summarised in Table 2.

Table 1. Features of general purpose fixed-point DSPs from Texas Instruments, Motorola and Analog Devices.

Table 2 Features of general purpose floating-point DSPs from Texas Instruments, Motorola and Analog Devices.

6.Selecting DSP Processors

The choice of a DSP processor for a given application is an important issue because of the wide range of processors available. Specific factors that may be considered when selecting a DSP processor for an application include architectural features, execution speed, type of arithmetic and word length:

(1). Architectural features Most DSP processors available today have good architectural features, but these may not be adequate for a specific application. Key features of interest include size of on-chip memory, special instructions and I/O capability. On-chip memory is an essential requirement in most real-time DSP applications for fast access to data and rapid program execution. For memory hungry applications (e.g. digital audio Dolby AC-2, FAX/Modem, MPEG coding/decoding), the size of internal RAM may become an important distinguishing factor. Where internal memory is insufficient this can be augmented by high speed, off-chip memory, although this may add to system costs. For applications that require fast and efficient communication or data flow with the outside world, I/O features such interface to ADC and DACs, DMA capability and support for multiprocessing may be important. Depending on the application, a rich set of special instructions to support DSP operations are important, e.g. zero-overhead looping capability, dedicated DSP instructions, and circular addressing.

(2). Execution speed The speed of DSP processors is an important measure of performance because of the time critical nature of most DSP tasks. Traditionally, the two main units of measurement for this are the clock speed of the processor, in MHz, and the number of instructions performed, in millions of instructions per second (MIPS) or in the case of floating point DSP processors, in millions of floating point operations per second (MFLOPS). However, such measures may be inappropriate in some cases because of significant differences in the way different DSP processors operate with most able to perform multiple operations in one machine instruction. For example, the C62x family of processors can execute as many as eight instructions in a cycle. The number of operations performed in each cycle also differs from processor to processor. Thus, comparison of execution speed of processors based on such measures may not be meaningful. An alternative measure is based on the execution speed of benchmark algorithms e.g. DSP kernels such as FFT, FIR and IIR filters. In Tables 1 and 2, performance indices based on such benchmarks are given to give an indication of the relative performance of a number of popular DSP processors.

(3). Type of arithmetic The two most common type of arithmetic used in modern DSP processors are fixed and floating point arithmetic. Floating arithmetic is the natural choice for applications with wide and variable dynamic range requirements (dynamic range may be defined as the difference between the largest and smallest signal levels that can be represented or the difference between the largest signal and the noise floor, measured in decibel). Fixed point processors are favoured in low cost, high volume applications (e.g. cellular phones and computer disk drives). The use of fixed point arithmetic raises issues associated with dynamic range constraints which the designer must address (see later). In general, floating processors are more expensive than fixed point processors, although the cost difference has fallen significantly in recent years. Most floating point DSP processors available today also support fixed point arithmetic.

(4). Word length Processor data word length is an important parameter in DSP as it can have a significant impact on signal quality. It determines how accurately parameters and results of DSP operations can be represented (see later for details). In general, the longer the data word the lower the errors that are introduced by digital signal processing. In fixed point audio processing, for example, a processor word length of at least 24-bits is required to keep the smallest signal level sufficiently above the noise floor generated by signal processing to maintain CD quality. A variety of processor word length is used in fixed point DSP processors, depending on application (see Table 1). Fixed point DSP processors aimed at telecommunications markets tend to use a 16-bit word length (e.g. TMS320C54x), whereas those aimed at high quality audio applications tend to use 24-bits (e.g. DSP56300). In recent years, we have seen a trend towards the use of more bits for the ADC and DAC (e.g. Cirrus 24-bit audio codec, CS4228) as the cost of these devices falls to meet the insatiable demand for increased quality. Thus, we are likely to see an increased demand for larger processor word lengths for audio processing. In fixed point processors, it may also be necessary to provide guard bits (typically 1 to 8 bits) in the accumulators to prevent arithmetic overflows during extended multiply and accumulate operations. The extra bits effectively extend the dynamic range available in the DSP processor. In most floating point DSP processors, a 32-bit data size (24-bit mantissa and 8-bit exponents) are used for single-precision arithmetic. This size is also compatible with the IEEE floating point format (IEEE 754). Most floating point DSP processors also have fixed point arithmetic capability, and often support variable data size, fixed point arithmetic.

In practice, factors such as experience/familiarity with a particular DSP processor family, ease of use, time to market and costs may be the over-riding factors in selecting a given processor.

Problems

1. Analogue I/O

2. IIR filter design amplitude distortion and filter order

3. FIR filter design - half band filters.

4. DSP processors and DSP implementation

5. Multirate systems

6. Adaptive systems.

118

_1044531659.unknown

_1044538342.unknown

_1044550191.unknown

_1044531829.unknown

_1044532727.unknown

_1019818010.unknown

_1044530853.unknown

_1019821230.unknown

_1019456474.unknown

_1019460765.unknown

Elec327b Dsp Processors 1

Documents

Transcript of Elec327b Dsp Processors 1