Types of DSP Architectures

45
Types of Architectures Types of Architectures VON-Neumann Architecture. Harvard Architecture. Super Harvard Architecture VLIW Architecture.

description

DSP Processor Architecture and Features

Transcript of Types of DSP Architectures

Page 1: Types of DSP Architectures

Types of ArchitecturesTypes of Architectures

VON-Neumann Architecture.

Harvard Architecture.

Super Harvard Architecture

VLIW Architecture.

Page 2: Types of DSP Architectures

The VON-Neumann ArchitectureThe VON-Neumann Architecture In 1946 , Developed by John Von Neumann. Development took place at University of Pennsylvania’s

Moore school of Electrical Engineering. This is the most widely used today, and is implemented by

the majority of Microprocessors on the market. For this architecture, all the elements of the computer are

interconnected by a single system of 3 busses: Data , Address & Control Bus.

Page 3: Types of DSP Architectures

The Von Neumann ArchitectureThe Von Neumann Architecture

Model for designing and building computers, based on the following three characteristics:

1) The computer consists of four main sub-systems: Memory ALU (Arithmetic/Logic Unit) Control Unit Input/output System (I/O)

2) Program is stored in memory during execution.

3) Program instructions are executed sequentially.

Page 4: Types of DSP Architectures

The Von Neumann ArchitectureThe Von Neumann Architecture

Memory

Processor (CPU)

Input-OutputControl Unit

ALUStore data and programStore data and program

Execute programExecute program

Do arithmetic/logic operationsDo arithmetic/logic operationsrequested by programrequested by program

Communicate withCommunicate with"outside world", "outside world", e.g. e.g. • ScreenScreen• KeyboardKeyboard• Storage devices Storage devices • ......

Bus

Page 5: Types of DSP Architectures

VON-Neumann ArchitectureVON-Neumann ArchitectureThe Data Bus : Transports data between the CPU and its peripherals. It is bi-directional.The CPU can read or write data in the peripherals.

The Address Bus:The CPU uses the address bus to indicate which peripherals it wants to access, and within each peripheral which specific register.The address bus is unidirectional.The CPU always writes the address, which is read by the peripherals.

Control Bus:This bus carries signals that are used to manage and Synchronize the exchanges between the CPU and its peripherals. Special lines of the control bus, such as interrupt or wait-state signals carry information from the peripherals to the CPU.

Page 6: Types of DSP Architectures

Memory SubsystemMemory Subsystem Memory, also called RAM (Random Access

Memory) Consists of many memory cells (storage units) of a fixed

size. Each cell has an address associated with it: 0, 1, … All accesses to memory are to a specified address.

A cell is the minimum unit of access (fetch/store a complete cell).

The time it takes to fetch/store a cell is the same for all cells.

When the computer is running, both Program Data (variables)

are stored in the memory.are stored in the memory.

Page 7: Types of DSP Architectures

Memory Size / SpeedMemory Size / Speed

Typical memory in a personal computer (PC):Typical memory in a personal computer (PC): 64MB - 256MB64MB - 256MB

Memory sizes:Memory sizes: Kilobyte Kilobyte (KB)(KB) = 2= 210 10 = = 1,024 bytes ~ 1 thousand 1,024 bytes ~ 1 thousand Megabyte (MB)Megabyte (MB) = 2= 22020 = = 1,048,576 bytes ~ 1 million 1,048,576 bytes ~ 1 million GigabyteGigabyte (GB)(GB) = 2= 23030 = = 1,073,741,824 bytes ~ 1 billion 1,073,741,824 bytes ~ 1 billion

Memory Access Time (read from/ write to memory)Memory Access Time (read from/ write to memory) 50-75 nanoseconds (1 nsec. = 0.000000001 sec.)50-75 nanoseconds (1 nsec. = 0.000000001 sec.)

RAM isRAM is volatile (can only store when power is on)volatile (can only store when power is on) relatively expensive.relatively expensive.

Page 8: Types of DSP Architectures

Operations on Memory Operations on Memory Fetch (address):

Fetch a copy of the content of memory cell with the specified address.

Non-destructive, copies value in memory cell.

Store (address, value): Store the specified value into the memory cell specified by

address. Destructive, overwrites the previous value of the memory

cell.

The memory system is interfaced via: Memory Address Register (MAR) Memory Data Register (MDR) Fetch/Store signal

Page 9: Types of DSP Architectures

Structure of the Memory SubsystemStructure of the Memory Subsystem Fetch(address)

Load address into MAR. Decode the address in MAR. Copy the content of memory

cell with specified address into MDR.

Store(address, value) Load the address into MAR. Load the value into MDR. Decode the address in MAR Copy the content of MDR into

memory cell with the specified address.

MAR MDR

---

Memorydecodercircuit

Fetch/Storecontroller

F/S

Page 10: Types of DSP Architectures

Input/output SubsystemInput/output Subsystem Handles devices that allow the computer system to:

Communicate and interact with the outside world Screen, keyboard, printer, ...

Store information (mass-storage) Hard-drives, floppies, CD, tapes, …

Mass-Storage Device Access Methods: Direct Access Storage Devices (DASDs)

Hard-drives, floppy-disks, CD-ROMs, ...

Sequential Access Storage Devices (SASDs) Tapes (for example, used as backup devices)

Page 11: Types of DSP Architectures

The ALU SubsystemThe ALU Subsystem The ALU (Arithmetic/Logic Unit) performs

mathematical operations (+, -, x, /, …) logic operations (=, <, >, and, or, not, ...)

In today's computers integrated into the CPU Consists of:

Circuits to do the arithmetic/logic operations. Registers (fast storage units) to store intermediate

computational results. Bus that connects the two.

Page 12: Types of DSP Architectures

Structure of the ALUStructure of the ALU Registers:

Very fast local memory cells, that store operands of operations and intermediate results.

CCR (condition code register), a special purpose register that stores the result of <, = , > operations.

ALU circuitry: Contains an array of circuits to

do mathematical/logic operations.

Bus: Data path interconnecting the registers to the ALU circuitry.

ALU circuitry

GT EQ LT

R0

R1

R2

Rn

Page 13: Types of DSP Architectures

The Control UnitThe Control Unit Program is stored in memory

as machine language instructions, in binary. The task of the control unit is to execute programs

by repeatedly: Fetch from memory the next instruction to be

executed. Decode it, that is, determine what is to be done. Execute it by issuing the appropriate signals to the

ALU, memory, and I/O subsystems. Continues until the HALT instruction.

Page 14: Types of DSP Architectures

Machine Language InstructionsMachine Language Instructions

A machine language instruction consists of: Operation code, telling which operation to perform Address field(s), telling the memory addresses of the

values on which the operation works. Example: ADD X, Y (Add content of memory locations X

and Y, and store back in memory location Y).

Assume: opcode for ADD is 9, and addresses X=99, Y=100

00001001 0000000001100011 0000000001100100

Opcode (8 bits)Opcode (8 bits) Address 1 (16 bits)Address 1 (16 bits) Address 2 (16 bits)Address 2 (16 bits)

Page 15: Types of DSP Architectures

Instruction Set DesignInstruction Set Design

Two different approaches:

Reduced Instruction Set Computers (RISC) Instruction set as small and simple as possible. Minimizes amount of circuitry --> faster computers

Complex Instruction Set Computers (CISC) More instructions, many very complex Each instruction can do more work, but require more

circuitry.

Page 16: Types of DSP Architectures

Structure of the Control UnitStructure of the Control Unit PC (Program Counter):PC (Program Counter):

stores the address of next instruction to fetchstores the address of next instruction to fetch IR (Instruction Register):IR (Instruction Register):

stores the instruction fetched from memorystores the instruction fetched from memory Instruction Decoder:Instruction Decoder:

Decodes instruction and activates necessary circuitryDecodes instruction and activates necessary circuitry

Instruction Decoder

IR

+1

PC

Page 17: Types of DSP Architectures

von Neumannvon Neumann

ArchitectureArchitecture

von Neumannvon Neumann

ArchitectureArchitecture

Page 18: Types of DSP Architectures

How does this all work together?How does this all work together?

Program Execution:

PC is set to the address where the first program instruction is stored in memory.

Repeat until HALT instruction or fatal error Fetch instruction

Decode instruction

Execute instruction

End of loop

Page 19: Types of DSP Architectures

Program Execution (cont.)Program Execution (cont.)

Fetch phase PC --> MAR (put address in PC into MAR) Fetch signal (signal memory to fetch value into MDR) MDR --> IR (move value to Instruction Register) PC + 1 --> PC (Increase address in program counter)

Decode Phase IR -> Instruction decoder (decode instruction in IR) Instruction decoder will then generate the signals to

activate the circuitry to carry out the instruction.

Page 20: Types of DSP Architectures

Program Execution (cont.)Program Execution (cont.)

Execute Phase Differs from one instruction to the next.

Example: LOAD X (load value in addr. X into register)

IR_address -> MAR Fetch signal MDR --> R

ADD X left as an exercise

Page 21: Types of DSP Architectures

Instruction Set for Our Von Neumann MachineInstruction Set for Our Von Neumann Machine

OpcodeOpcode OperationOperation MeaningMeaning

0000 LOAD X CON(X) --> R0001 STORE X R --> CON(X)0010 CLEAR X 0 --> CON(X)0011 ADD X R + CON(X) --> R0100 INCREMENT X CON(X) + 1 --> CON(X)0101 SUBTRACT X R - CON(X) --> R0101 DECREMENT X CON(X) - 1 --> CON(X)

0111COMPARE X If CON(X) > R then GT = 1 else 0

If CON(X) = R then EQ = 1 else 0

If CON(X) < R then LT = 1 else 01000 JUMP X Get next instruction from memory location X1001 JUMPGT X Get next instruction from memory loc. X if GT=1... JUMPxx X xx = LT / EQ / NEQ1101 IN X Input an integer value and store in X1110 OUT X Output, in decimal notation, content of mem. loc. X1111 HALT Stop program execution

Page 22: Types of DSP Architectures

HARVARD ARCHITECTUREHARVARD ARCHITECTURE

Page 23: Types of DSP Architectures

HARVARD ARCHITECTUREHARVARD ARCHITECTURE

A DSP Chip is a microprocessor A DSP Chip is a microprocessor specially designed for DSP specially designed for DSP applicationsapplications

Harvard architecture allows Harvard architecture allows multiple memory reads multiple memory reads

Architecture optimized to provide Architecture optimized to provide rapid processing of discrete time rapid processing of discrete time signals, e.g. Multiply and signals, e.g. Multiply and Accumulate (MAC) in one cycleAccumulate (MAC) in one cycle

Program Memory Data Bus

ALU

Accumulator

Program Memory

Data Memory

Multiplexer Multiplexer

Data Memory Data Bus

Page 24: Types of DSP Architectures

HARVARD ARCHITECTUREHARVARD ARCHITECTURE

The Harvard architecture – as the name implies – was developed at Harvard University.

By contrast to the Von Neumann architecture, it uses two separate bus systems to transport the instruction codes and the data being processed by the CPU.

The Program Bus System: Is used exclusively to transport instruction codes from the program memory to the CPU during the fetch cycle.

The Data Bus System: Is used exclusively to transport data from/to the CPU, to/from the memory and peripherals.

Page 25: Types of DSP Architectures

PROS AND CONS OF EACH ARCHITECTUREPROS AND CONS OF EACH ARCHITECTURE HA is capable of simultaneously reading an instruction HA is capable of simultaneously reading an instruction

code, and reading or writing a memory or peripheral as code, and reading or writing a memory or peripheral as part of the execution of the previous instruction. part of the execution of the previous instruction.

It has a speed advantage over the Von Neumann It has a speed advantage over the Von Neumann architecture.architecture.

The HA is also safer, since it is not possible for the CPU The HA is also safer, since it is not possible for the CPU to mistakenly write codes into the program memory and to mistakenly write codes into the program memory and therefore corrupt the code while it is executing.therefore corrupt the code while it is executing.

HA is HA is less flexibleless flexible. It needs two independent memory . It needs two independent memory banks (one for program and another one for data). These banks (one for program and another one for data). These two resources are not interchangeable.two resources are not interchangeable.

For an Embedded system that always runs the same For an Embedded system that always runs the same application, Harvard architecture is most suitable.(memory application, Harvard architecture is most suitable.(memory needs can be easily anticipated)needs can be easily anticipated)

Page 26: Types of DSP Architectures

PROS AND CONS OF EACH ARCHITECTUREPROS AND CONS OF EACH ARCHITECTURE

A computer system however A computer system however may run a wide variety of may run a wide variety of applicationsapplications, requiring large data and program memories, requiring large data and program memories

The Von Neumann architecture is better suited for this The Von Neumann architecture is better suited for this type situation, because program and data memories are type situation, because program and data memories are interchangeable, and it will lead to a better usage of the interchangeable, and it will lead to a better usage of the memory resources.memory resources.

Page 27: Types of DSP Architectures

The SHARCThe SHARC

Super Harvard Architecture ComputerSuper Harvard Architecture Computer

Page 28: Types of DSP Architectures

The SHARCThe SHARC

Developed by Analog Devices. Optimized for demanding DSP and imaging

applications. 32 Bit floating point, with 40 bit extended floating

point capabilities. Large on-chip memory. Ideal for scalable multi-processing applications ..

Page 29: Types of DSP Architectures

Super Harvard ArchitectureSuper Harvard Architecture

Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip.

The SHARC is unique in that it’s internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!!.

Page 30: Types of DSP Architectures

SHARC DSPSHARC DSP

The SHARC incorporates features aimed at optimizing such loops.

High-Speed Floating Point Capability. Extended Floating Point.

These features are DSP specific. Meaning, when applied to a non-DSP application

performance may not be as optimal.

Page 31: Types of DSP Architectures

Floating Point and Extended Floating PointFloating Point and Extended Floating Point

The SHARC supports Floating, Extended-Floating and Non-Floating point operations.

No additional clock cycles for floating point computations.

Data automatically truncated and zero padded when moved between 32-bit memory and internal registers.

Excellent signal to noise ratio.

Page 32: Types of DSP Architectures

SHARC’s Internal MemorySHARC’s Internal Memory

Makes SHARC unique. Size

Allows many complex functions to be preformed on-chip. Eliminating the need to move data between internal and external memory.

Memory size is significantly larger then most other high speed computational devices.

Dual-block, Dual-port Optimizes the Harvard Architecture by allowing the

fetch of instructions while performing data memory accesses.

Page 33: Types of DSP Architectures

Multiply and Accumulate Instructions on the Multiply and Accumulate Instructions on the SHARCSHARC

Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle.

The SHARC’s super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers.

These instructions give the SHARC its 120 megaflop rating.

Page 34: Types of DSP Architectures

DAGs on the SHARCDAGs on the SHARC

Data Address Generators are integer computation units that manage the indexing of registers.

Allows the SHARC to to fetch a value and update the index value.

If the updated value exceeds a limit, the DAB adjusts the index so that it wraps.

This occurs in the same clock cycle as the read or write.

Page 35: Types of DSP Architectures

DAG CapabilitiesDAG Capabilities

Circular Buffering Rather then actually moving data in and out of a

vector, circular buffers are used. Updating the index modulo, the oldest entry can be

conveniently replaced by the newest entry.

Bit Reverse Addressing The bit pattern of a vector index is reversed. Done automatically by the SHARC. Required for Fast Fourier Transform (FFT), which is

often critical to DSP applications.

Page 36: Types of DSP Architectures

SHARC DSPSHARC DSP

What Makes the SHARC unique? It also has some features not related directly related to

optimizing numeric computations. Pipelining Handling Branches.

Why has this not emerged sooner? Technology has only recently become available to

make it economical to integrate general single computing devices.

Page 37: Types of DSP Architectures

SHARC’s PipelineSHARC’s Pipeline

3 stages1 Instruction Fetch2 Decode3 Execution

Takes three clock cycles for an instruction to propagate through the pipeline.

The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles.

Page 38: Types of DSP Architectures

Multi-processingMulti-processing

SHARC is uniquely equipped for multi-processing. Links to ports are very powerful multi-processing

capabilities. Two main program models depending on the

application. Adapts well to different multi-processing

architectures.

Page 39: Types of DSP Architectures

Multi-processing Multi-processing SHARC LinksSHARC Links

SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec.

Links designed for point-to-point connections.

Data can be transmitted in either direction but not both simultaneously.

Page 40: Types of DSP Architectures

Multi-processing Program Model MIMDMulti-processing Program Model MIMD

Multiple instruction, multiple data.

Good for applications that require multiple instruction threads to execute concurrently.

Processors operate individually. Each processor executes different code.

Typically used for image reconstruction and multi-channel DSP.

Page 41: Types of DSP Architectures

Multi-processing Program Model Multi-processing Program Model SIMDSIMD

Single instruction, multiple data. Works best when all processors execute identical

instruction sequences. Do not require overhead for inter-processor

synchronization. Typically used for synthetic aperture radar (SAR)

and automatic target recognition.

( SAR is a form of SAR is a form of radar  which is used to which is used to create images of an object, such as a landscape of an object, such as a landscape)

Page 42: Types of DSP Architectures

Summary of what makes the SHARC Summary of what makes the SHARC SuperSuper

It performs excellently for DSP applications.

Employs a Harvard Architecture with very large on chip memory.

Respectable Megaflop rating.

It’s multiprocessing capabilities.

Page 43: Types of DSP Architectures

VLIW(Very Large Instruction Width) Architecture

VLIW architectures execute multiple instructions/cycle VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets and use simple, regular instruction sets More parallelism, higher performance.More parallelism, higher performance. Better compiler target.Better compiler target.

Multiple independent instructions per cycle, packed into Multiple independent instructions per cycle, packed into single large "instruction word" or "packet“.single large "instruction word" or "packet“.

Large, uniform register sets.Large, uniform register sets. Wide program and data buses.Wide program and data buses.

Invented by Josh Fisher in his research group at Yale University in the early 1980s.

Page 44: Types of DSP Architectures

Very Long Instruction Word (VLIW)

• A technique for instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel

• Example of a single VLIW instruction:

F=a+b; c=e/g; d=x&y; w=z*h;

VLIW instruction F=a+b c=e/g d=x&y w=z*h

PU

PU

PU

PU

a

b

F

c

d

w

e

g

x

y

z

h

Page 45: Types of DSP Architectures

VLIW – Simplified Architecture ExampleVLIW – Simplified Architecture Example

Program Memory

256 bits consisting of 8 instructionsEach instruction is 32 bits

Execution Units

Execution Units

Execution Units

Execution Units

Execution Units

Execution Units

Execution Units

Execution Units

Each unit executing one instruction