Notes DSP Processors

DSP ProcessorsEngr. Naveed Khan Baloch*

Digital Signal ProcessingProcessing of digitally represented signalsSignals represented digitally via sequence of samplesDigital signals obtained from physical signals via Transducers and Analog to digital convertors (ADC)Digital Signal ProcessorElectronic system that processes digital signals*

Definition A digital signal processor (DSP) is a specialized microprocessor with optimized architecture for fast operational needs of Digital Signal Processing.

*

DSP ApplicationsAudioCoding, Decoding, Surround-soundCommunicationScrambling, Cellular phones, software radiosControlRobotics, Disk drive control, motor controlMedicalDiagnostics equipment, hearing aidsDefenseRadar and sonar processing, missile guidance

*

Why DSP Processors ?Reprogrammable Cost effectiveFast computationEnergy EfficiencyFast MultipliersMultiple Execution UnitsEfficient Memory AccessesCircular BufferingData FormatZero-Overhead LoopingStreamlined I/OSpecialized Instruction SetsSIMD*

Reprogrammable*

Cost EffectiveThere is no need for a separate signal processing unitSignal processing and control functions can be performed on a single silicon chip

*

Faster computation

Because of specialized Hardware for DSP application computation becomes vary fastSeparate MAC units and fast multipliers are used for many DSP algorithms for faster execution i.e.FIR FilterIIR FilterDCT FFT*

Fast Multipliers

Originally, microprocessors implemented multiplications by a series of shift and add operations, each of which consumed one or more clock cycles.Most DSP processors can only take one clock cycle for the multiplication operation.modern DSP processors include at least one dedicated single- cycle multiplier or combined multiply-accumulate (MAC) unit*

Multiple Execution Units

DSP applications typically have very high computationalrequirements in comparison to other types of computingtasks, since they often must execute DSP algorithms (suchas FIR filtering) in real time on lengthy segments of signals sampled at 10-100 KHz or higher. Hence, DSP processors often include several independent execution units that are capable of operating in parallelfor example, in addition to the MAC unit, they typically contain an arithmetic- logic unit (ALU) and a shifter.*

Efficient Memory Accesses

Small bank of RAM near the processor core thatis used as an instruction cacheMany DSP processors also support circular addressing, which allows the processor to access a block of data sequentially and then automatically wrap around to the beginning address*

Circular Buffering

The process by which the Data Address Generator (DAG) wraps around or repeatedly steps through a range of registers.Instructions Accommodate 3 elementsBuffer AddressBuffer SizeIncrement *

Data Format

Fixed point and floating point processors.Use of Accumulator to reduce the overflow.*

Assignment # 2Highlight the difference between the Architecture of Fixed point and Floating point DSP processors with at least 2 examples from TI and Blackfin processors. www.TI.comwww.analog.com

*

Zero-Overhead LoopingSpecial loop or repeat instruction is provided which allows the programmer to implement a for-next loop without expending any clock cycles for updating and testing the loop counter or branching back to the top of the loop. This feature is often referred to as zero-overhead looping.*

Streamlined I/O

To allow low-cost, high-performance input and output, most DSP processors incorporate one or more specialized serial or parallel I/O interfaces, and streamlined I/O handling mechanisms, such as low-overhead interrupts and direct memory access (DMA), to allow data transfers to proceed with little or no intervention from the processor's computational units.*

Specialized Instruction Sets

DSP processor instruction sets have traditionally been designed with two goals in mindMaximum use of the processor's underlying hardwareMinimize the amount of memory space required to store DSP programsHighly Specialized ComplicatedIrregularUse Assembly instead of C for maximum benefit

*

SIMDSIMD, or single-instruction, multiple-data, is not a class of architecture itself, but is instead an architectural technique that can be used within any of the classes of architecturesImproves performance on some algorithms by allowing the processor to execute multiple instances of the same operationFor example, a SIMD multiplication instruction could perform two or more multiplications on different sets of input operands in parallel in a single clock cycle.*

Outline

Blackfin Family OverviewThe Blackfin Core

Arithmetic operations Data fetchingSequencingThe Blackfin Bus Architecture and Memory

Modified Harvard architectureHierarchical memory structureFlexible memory managementAdditional Blackfin Core Features

DMADynamic power managementOn-chip debug support*

Blackfin Family Overview

The Blackfin family consists of:

A broad range of Blackfin processorsSoftware development tools Hardware evaluation and debug toolsExtensive third-party support

Development toolsOperating systemsTCP/IP stacksHardware building blocksSoftware solutions*

Blackfin Processors

All Blackfin processors combine extensive DSP capability with high end MCU functions on the same core.

Creates a highly efficient and cost-effective solution.A single software development tool chainAll Blackfin processors are based on the same core architecture.

Once you understand one Blackfin processor, you can easily migrate from one family member to another.Code compatible across family members.Processors vary in clock speed, amount of on-chip memory, peripheral suite, package types and sizes, power, and price.

Large selection lets you optimize your choice of a Blackfin processor for your application.

*

Blackfin Family Peripherals

The Blackfin family supports a wide variety of I/O:

EBIU (External Bus Interface Unit)Parallel peripheral interface (PPI)Serial ports (SPORTS)GPIOTimersUARTSSPIEthernetUSBCANTwo Wire Interface (TWI)Pixel compositorLockboxsecure technologyHost DMAATAPISDIO*

Blackfin Processors Perform Signal Processing and Microcontroller Functions*

Blackfin ArchitectureWhat does it mean for the developer?

Combining controller and DSP capabilities into a single core, along with rich I/O, enables development of efficient, low cost embedded media applications.

For example, multimedia over IP, digital cameras, telematics,

software radio

From a development perspective, a single core means there is only one tool chain.

An embedded application consisting of both control and signal processing modules is built using the same compiler.The result is dense control code and high performance DSP code.*

Features

ControllerL1 memory space for stack and heapDedicated stack and frame pointersByte addressabilitySimple bit-level manipulationDSPFast, flexible arithmetic computational unitsUnconstrained data flow to/from computational unitsExtended precision and dynamic rangeEfficient sequencingEfficient I/O processingThe DSP aspect of the Blackfin core is optimized to perform FFTsand convolutions*

Blackfin Core (e.g., ADSP-BF54x)*

The Blackfin CoreThe core consists of:Arithmetic unitSupports SIMD operationLoad/store architectureAddressingAddressing unitSupports dual data fetchSequencerEfficient program flow controlRegister filesDataAddressing*

The Arithmetic Unit

The Arithmetic UnitPerforms arithmetic operationsDual 40-bit ALU (Arithmetic/Logic Unit)Performs 16-/32-/40-bit arithmetic and logical operationsDual 16 x 16 multiplierPerforms dual MACs(multiply-accumulates) when used with ALUsBarrel shifterPerforms shifts, rotates, bit operations

*

Data Registers

There are 8x 32-bit registers in the data register file.Used to hold 32-bit vales or packed 16-bitThere are also 2x 40-bit accumulators. Typically used for MAC operations*

16-Bit ALU OperationsExamples

The Algebraic Assembly syntax is intuitive and makes it easy to understand what the instruction is doing.*

32-Bit ALU OperationsExamples*

Dual MAC OperationsExample*

Barrel Shifter

Enable shifting or rotating any number of bits within a 16-/32-/40-bit register in a single cyclePerform individual bit operations on 32-bit data register contents

BITSET, BITCLR, BITTGL, BITTST

Field Extract and Deposit instructions

Extract or insert a field of bits out of or into a 32-bit data register

*

8-Bit ALU Operations

Four 8-bit ALUsprovide parallel computational power targeted mainly for video operations.

Quad 8-bit add/subtractQuad 8-bit averageSAA (Subtract-Absolute-Accumulate) instruction A quad 8-bit ALU instruction takes one cycle to complete.*

Additional Arithmetic Instructions

There are a number of specialized instructions that are used to speed up the inner loop on various algorithms.Bitwise XOREnable creating LFSR (Linear Feedback Shift Registers) for use in CRC calculations or the generation of PRN sequencesBit stream multiplexing, add on sign, compare selectConvolutionalencoder and Viterbidecoder supportAdd/Subtract with prescaleup/downIEEE 1180compliant 2D 8 x 8 DCTs(Discrete Cosine Transforms)Vector searchEnable search a vector a pair at a time for greatest or least value

*

The Addressing Unit

UnitThe addressing unit generates addresses for data fetches.Two DAG (Data Address Generator) arithmetic units enable generation of independent 32-bit wide addresses that can reach anywhere within the Blackfin memory space.Up to two fetches can occur at the same time.*

Address Registers

There are 6x general-purpose Pointer Registers.Used for GP 8-/16-/32-bit fetches from memoryThere are four sets of registers used for DSP-style data accesses.Used for 16-/32-bit DSP data fetches such as dual data fetch, circular buffer addressing, and bit reversalThere are also dedicated stack (SP) and frame (FP) pointers.These are used for 32-bit accesses to stack frames.

*

Addressing

Addressing Unit supports:Addressing onlyWith specified Pointer or Index RegisterProvide address and post modifyAdd an offset after the fetch is doneCircular buffering supported with this methodProvide address with an offsetAdd an offset before the fetch, but no pointer updateUpdate address onlyModify address with reverse carry addAll addressing is Register Indirect.*

Addressing

Index Registers I0-I3 (32-/16-bit accesses)Pointer Registers P0P5 (32-/16-/8-bit accesses)Stack and Frame Pointer Registers (32-bit accesses)All addresses are Byte addresses.

Ordering is Little Endian.Addresses must be aligned for the word size being fetched.i.e., 32-bit fetches from addresses that are a multiple of four

*

Circular Buffer ExampleExampleBase address (B) and Starting address (I) = 0Buffer length L = 44(There are 11 data elements and each data element is 4-bytes)Modify value M = 16 (4 elements *4-bytes/element) Example memory access:R1 = [I0 ++ M2];

The Addressing Unit supports Circular Buffer pointer addressing.The process of boundary checking and pointer wrapping to stay inbounds happens in hardware with no overhead. Buffers can be placed anywhere in memory without restriction dueto the Base address registers.*

The Sequencer

The sequencers function is to generate addresses for fetching instructions.Uses a variety of registers to select the next addressAligns instructions as they are fetchedAlways reads 64 bits from memoryRealigns what is fetched into individual 16-/32-/64-bit opcodes before sending to the execution pipelineHandles eventsInterrupts and exceptionsConditional execution*

Notes DSP Processors

Documents

Transcript of Notes DSP Processors