TMS320C4X Digital Signal Processing
-
Upload
ramasubramanian-subramanian -
Category
Documents
-
view
476 -
download
0
description
Transcript of TMS320C4X Digital Signal Processing
Block diagram (You can simplify how the buses are depicted etc.. and the fig. will
become smaller!!)
The TMS320C4x devices are 32-bit floating-point digital signal processors optimized
for parallel processing. The ’C4x family combines a high performance CPU and DMA
controller with up to six communication ports to meet the needs of multiprocessor and
I/O-intensive applications. Each ’C4x device contains an on-chip analysis module, which
supports hardware breakpoints for parallel processing development and debugging. The
’C4x family is source-code compatible with the TMS320C3x family of floating-point
DSPs.
The TMS320C40 is the original member of the ’C4x family. It features a CPU that can
deliver up to 30 MIPS/60 MFLOPS with a maximum I/O bandwidth of 384M bytes/s.
The ’C40 has 2K words of on-chip RAM, 128 words of program cache and a bootloader.
Two external buses provide an address reach of 4 gigawords of unified memory space.
The ’C40 is available in a 325-pin CPGA package.
The TMS320C44
The TMS320C44 is a lower cost version of the ’C40, for parallel processing applications
that are more price sensitive. The ’C44 features four communication ports and has an
external address reach of 32M words over two external buses. To further reduce cost, the
’C44 comes in a 304-pin PQFP package. The TMS320C44 can deliver up to 30 MIPS/60
MFLOPS performance with a maximum I/O bandwidth of 384M bytes/s. The ’C44 is
source-code compatible with the ’C40.
1. Key Features of the TMS320C4x
The TMS320C4x has several key features:
_ Up to 40 MIPS/80 MFLOPS performance with 488-Mbytes/s I/O capability
_ IEEE floating-point conversion for ease of use
_ Register-based CPU
_ Single-cycle byte and half-word manipulation capabilities
_ Divide and square root support for improved performance
_ On-chip memory includes 2K words of SRAM, 128 words of program
cache, and bootloader
_ Two external buses providing an address reach of up to 4 gigawords
_ Two memory-mapped 32-bit timers
_ 6 and 12 channel DMA
_ Up to six communication ports for multiprocessor communication
_ Idle mode for reduced power consumption
Central Processing Unit (CPU)
The ’C4x’s CPU has a register-based architecture. The CPU consists of the
several components:
1.Floating-point/integer multiplier
2.Arithmetic Logic Unit (ALU)
3.32-bit barrel shifter
4.Internal buses (CPU1/CPU2 and REG1/REG2)
5.Auxiliary register arithmetic units (ARAUs)
6.CPU register file
Floating-Point/Integer Multiplier
The multiplier performs single-cycle multiplications on 32-bit integer and 40-bit floating-
point values. The ’C4x implementation of floating-point arithmetic allows for floating-
point operations at fixed-point speeds via a 25-ns instruction cycle and a high degree of
parallelism. To gain even higher throughput, you can use parallel instructions to perform
a multiply and ALU operation in a single cycle.
When the multiplier performs floating-point multiplication, the inputs are 40-bit floating-
point numbers, and the result is a 40-bit floating-point number. When the multiplier
performs integer multiplication, the input data is 32 bits and yields either the 32 most-
significant bits or the 32 least-significant bits of the resulting 64-bit product.
Arithmetic Logic Unit (ALU) and Internal Buses
The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, and
40-bit floating-point data, including single-cycle integer and floating-point conversions.
Results of the ALU are always maintained in 32-bit integer or 40-bit floating-point
formats. The barrel shifter is used to shift up to 32 bits left or right in a single cycle.
Four internal buses, CPU1, CPU2, REG1, and REG2, carry two operands from
memory and two operands from the register file, thus allowing parallel multiplies
and adds/subtracts on four integer or floating-point operands in a single cycle.
Auxiliary Register Arithmetic Units (ARAUs)
The two auxiliary register arithmetic units (ARAU0 and ARAU1) can generate
two addresses in a single cycle. The ARAUs operate in parallel with the multiplier
and ALU. They support addressing with displacements, index registers (IR0 and IR1),
and circular and bit-reversed addressing.
CPU Primary Register File
The ’C4x primary register file provides 32 registers in a multiport register file that is
tightly coupled to the CPU. Table 2–1 lists register names and functions, followed by the
section number and page of each description. All of the primary register file registers can
be operated upon by the multiplier and ALU and can be used as general-purpose
registers. However, the registers also have some special functions. For example, the 12
extended-precision registers are especially suited for maintaining floating-point results.
The eight auxiliary registers support a variety of indirect addressing modes and can
be used as general-purpose 32-bit integer and logical registers. The remaining registers
provide system functions such as addressing, stack management, processor status,
interrupts, and block repeat.
The extended-precision registers (R0–R11) are capable of storing and supporting
operations on 32-bit integer and 40-bit floating-point numbers. Any instruction
that assumes that the operands are floating-point numbers uses bits 39–0. If the operands
are either signed or unsigned integers, only bits 31–0 are used, and bits 39–32 remain
unchanged. This is true for all shift operations.
The 32-bit auxiliary registers (AR0–AR7) can be accessed by the CPU and
modified by the two auxiliary register arithmetic units (ARAUs). The primary
function of the auxiliary registers is the generation of 32-bit addresses. They
can also be used as loop counters or as 32-bit general-purpose registers that
can be modified by the multiplier and ALU.
The data page pointer (DP) is a 32-bit register. The 16 LSBs of the data page
pointer are used by the direct addressing mode as a pointer to the page of data
being addressed. The ’C4x can address up to 64K pages, each page containing
64K words
The 32-bit index registers contain the value used by the auxiliary register
arithmetic unit (ARAU) to compute an indexed address.
The ARAU uses the 32-bit block size register (BK) in circular addressing to
specify the data block size.
The system stack pointer (SP) is a 32-bit register that contains the address
of the top of the system stack. The SP always points to the last element pushed
onto the stack. A push performs a pre-increment, and a pop performs a post-decrement
of the system stack pointer. The SP is manipulated by interrupts, traps, calls, returns, and
the PUSH/PUSHF and POP/POPF instructions.
The status register (ST) contains global information related to the state of the CPU.
Typically, operations set the condition flags of the status register according to whether
the result is zero, negative, etc. This includes register load and store operations as well as
arithmetic and logical functions. When the status register is loaded, however, a bit-for-bit
replacement is performed with the contents of the source operand, regardless of the state
of any bits in the source operand. Therefore, following a load, the contents of the status
register are identically equal to the contents of the source operand.
The DMA coprocessor interrupt enable register (DIE) is a 32-bit register
containing 2- and 3-bit fields to designate the interrupt synchronization
scheme for each of the six DMA channels. It allows each DMA channel to service
a corresponding input communication port and output communication
port. Also, each DMA channel can be synchronized with external interrupts or
the on-chip timers.
The CPU internal interrupt enable register (IIE) is a 32-bit register that enables/
disables interrupts for the six communication ports, both timers, and the
six DMA coprocessor channels.
The IIOF flag register (IIF) controls the function (general-purpose I/O or interrupt)
of the four external pins (IIOF0 to IIOF3). It also contains timer/DMA interrupt
flags.
The 32-bit repeat counter (RC) register specifies the number of times a block
of code is to be repeated when a block repeat is performed. When the processor
is operating in the repeat mode, the 32-bit repeat start address register
(RS) contains the starting address of the block of program memory to be repeated,
and the 32-bit repeat end address register (RE) contains the ending
address of the block to be repeated. Block Repear (RS,RE) and Repeat Count (RC)
Registers,
The program counter (PC) is a 32-bit register containing the address of the
next instruction to be fetched. Although the PC is not part of the CPU register
file, it is a register that can be modified by instructions that modify the program
flow.
CPU Expansion Register File
Besides the CPU primary register file, the expansion register file contains two
special registers that act as pointers:
_ The IVTP register points to the interrupt-vector table (IVT), which defines
vectors for all interrupts.
_ The TVTP register points to the trap vector table (TVT), which defines vectors
for 512 traps.
Memory Organization
The total memory reach of the ’C4x is 4G 32-bit words. Program memory (onchip
RAM or ROM and external memory) as well as registers affecting timers,
communication ports, and DMA channels are contained within this space. This
allows tables, coefficients, program code, and data to be stored in either RAM
or ROM. Thus, memory usage is maximized, and memory space allocated as
desired.
By manipulating one external pin (ROMEN), you can configure the first onemegaword
area of memory (0000 0000h to 000F FFFFh) to address the local
address bus or to address the on-chip ROM when you use the bootloader (with
remaining space reserved).
2.1 RAM, ROM, and Cache
The ROM block is reserved and contains a bootloader. Each RAM and ROM block is
capable of supporting two accesses in a single cycle. The separate program buses, data
buses, and DMA buses allow for parallel program fetches, data reads and writes, and
DMA operations. For example: the CPU can access two data values in one RAM block
and perform an external program fetch in parallel with the DMA coprocessor loading
another RAM block, all within a single cycle. The reserved ROM block (upper right
contains a bootloader. This loader supports loading of program and data at reset time.
Loading is from 8-, 16-, or 32-bit wide memories or any one of the six communication
ports. A 128k, 32-bit instruction cache is provided to store often-repeated sections of
code, thus greatly reducing the number of needed off-chip accesses. This allows for code
to be stored off-chip in slower, lower-cost memories. By using the cache to execute your
program, the external buses are freed for use by the DMA controller or CPU.
Memory Maps
For each processor, the level at the external pin ROMEN determines whether or not the
first megaword of memory addresses the internal ROM or external memory. The maps
illustrate the entire address space of the ’C40 and ’C44.
The value of ROMEN affects only the first megaword of memory:
_ A 1 at external pin ROMEN causes internal ROM to be enabled at 0000h
with the one-megaword space reserved (0000 0000h – 000F FFFFh).
This is shown in the right side of the figure.
_ A 0 at ROMEN causes addresses 0000 0000h – 000F FFFFh to be accessible
on the local bus. This is shown in the left side of the figure.
The rest of the memory map is the same for either level of ROMEN:
_ The second megaword of memory is devoted to peripherals
_ The third megaword of memory contains the two 1K-word (4K-byte) blocks
of RAM (BLK0 and BLK1 as shown at 002F F800h – 002F FFFFh).
_ The rest of the first 2 gigawords (0030 0000h – 7FFF FFFFh) is on the local
bus (external).
_ The second 2 gigawords (8000 0000h – FFFF FFFFh) are on the global
bus (external).
Caution
Any access to a reserved area in the address space produces unpredictable results.
Do not attempt to access reserved areas.
Memory Aliasing (’C44 only)
Memory aliasing occurs in the ’C44, since both the global and local ports on
that device have 24 pins, instead of the 31 pins on each port in the ’C40.
Memory aliasing causes the first 16 M of each address space to be repeated
in the memory map. Memory on the local bus occupies, and is aliased, in the
first 2 G of address space, and memory on the global bus occupies, and is
aliased, in the second 2 G of address space. Figure 2–7 shows the alias regions
on the local and global buses.
Memory Addressing Modes
The ’C4x supports a base set of general-purpose instructions as well as arithmetic-
intensive instructions that are particularly suited for digital signal processing
and other numeric-intensive applications. Refer to Chapter 6, Addressing
Modes, for detailed information on addressing.
Four groups of addressing modes are provided on the ’C4x. Each group uses
two or more of several different addressing types. The following list shows the
addressing modes with their addressing types.
_ General addressing modes:
_ Register. The operand is a CPU register.
_ Immediate. The operand is a 16-bit immediate value.
_ Direct. The operand is the contents of a 32-bit address
(concatenation of 16 bits of the data page pointer and a 16-bit
operand).
_ Indirect. A 32-bit auxiliary register indicates the address of the
operand.
_ Three-operand addressing modes:
_ Register. (same as for general addressing mode).
_ Indirect. (same as for general addressing mode).
_ Immediate. The operand is an 8-bit immediate value.
_ Parallel addressing modes:
_ Register. The operand is an extended-precision register.
_ Indirect. (same as for general addressing mode).
_ Branch addressing modes:
_ Register. (same as for general addressing mode).
_ PC-relative. A signed 16-bit displacement or a 24-bit displacement is
added to the PC.
Internal Bus Operation
A large portion of the ’C4x’s high performance is due to internal busing and parallelism.
Separate buses allow for parallel program fetches, data accesses,
and DMA accesses:
_ Program buses PADDR and PDATA
_ Data buses DADDR1, DADDR2, and DDATA
_ DMA buses DMAADDR and DMADATA
These buses connect all of the physical spaces (on-chip memory, off-chip
memory, and on-chip peripherals) supported by the ’C4x. Figure 2–3 shows
these internal buses and their connections to on-chip and off-chip memory
blocks.
The program counter (PC) is connected to the 32-bit program address bus
(PADDR). The instruction register (IR) is connected to the 32-bit program data
bus (PDATA). In this configuration, the buses can fetch a single instruction
word every machine cycle.
The 32-bit data address buses (DADDR1 and DADDR2) and the 32-bit data
data bus (DDATA) support two data memory accesses every machine cycle.
The DDATA bus carries data to the CPU over the CPU1 and CPU2 buses. The
CPU1 and CPU2 buses can carry two data memory operands to the multiplier,
ALU, and register file every machine cycle. Also internal to the CPU are register
buses REG1 and REG2, which can carry two data values from the register
file to the multiplier and ALU every machine cycle. Figure 2–2 shows the buses
that are internal to the CPU section of the processor.
The DMA controller is supported with a 32-bit address bus (DMAADDR) and
a 32-bit data bus (DMADATA). These buses allow the DMA to perform memory
accesses in parallel with the memory accesses occurring from the data and
program buses.
External Bus Operation
The ’C4x provides two identical external interfaces: the global memory interface
and the local memory interface. Each consists of a 32-bit data bus, a
31-bit (’C40) or 24-bit (’C44) address bus, and two sets of control signals. Both
buses can be used to address external program/data memory or I/O space.
The buses also have external RDY signals for wait-state generation with wait
states inserted under software control. Chapter 9, External Bus Operation,
covers external bus operation.
For multiple processors to access global memory and share data in a coherent
manner, arbitration is necessary. This arbitration (handshaking) is the purpose
of the ’C4x’s interlocked operations, handled through interlocked instructions.
Interrupts
The ’C4x supports four external interrupts (IIOF3–0), a number of internal interrupts,
a nonmaskable external NMI interrupt, and a nonmaskable external
RESET signal, which sets the processor to a known state. The DMA and communication
ports have their own internal interrupts. When the CPU responds
to the interrupt, the IACK pin can be used to signal an external interrupt acknowledge.
Peripherals
All ’C4x on-chip peripherals are controlled through memory-mapped registers
on a dedicated peripheral bus. This peripheral bus is composed of a 32-bit data
bus and a 32-bit address bus. This peripheral bus permits straightforward
communication to the peripherals. The ’C4x peripherals include two timers
and six (’C40) or four (’C44) communication ports.
Communication Ports
Six (’C40) or four (’C44) high-speed communication ports provide rapid processor-
to-processor communication through each port’s dedicated communication
interfaces. Coupled with the ’C4x’s two memory interfaces (global and
local), this allows you to construct a parallel processor system that attains optimum
system performance by distributing tasks among several processors.
Each ’C4x can pass the results of its work to another ’C4x through a communication
port, enabling each ’C4x to continue working. Chapter 12, Communication
Ports, explains communication port operation in detail.
The communication ports offer several features:
_ 160-megabits/s (20-Mbytes or 5-Mwords per second) bidirectional data
transfer operations (at 40-ns cycle time)
_ Simple processor-to-processor communication via eight data lines and
four control lines
_ Buffering of all data transfers, both input and output
_ Automatic arbitration to ensure communication synchronization
_ Synchronization between the CPU or the direct-memory access (DMA)
coprocessor and the six communication ports via internal interrupts and
internal ready signals.
_ Port direction pin (CDIR) to ease interfacing (’C44 only)
Direct Memory Access (DMA) Coprocessor
The six channels of the on-chip DMA coprocessor can read from or write to any
location in the memory map without interfering with the operation of the CPU.
This allows interfacing to slow external memories and peripherals without reducing
throughput to the CPU. The DMA coprocessor contains its own address
generators, source and destination registers, and transfer counter. Dedicated
DMA address and data buses allow for minimization of conflicts between
the CPU and the DMA coprocessor. A DMA operation consists of a
block or single-word transfer to or from memory. A key feature of the DMA
coprocessor is its ability to automatically reinitialize each channel following a
data transfer.
Timers
The two timer modules are general-purpose 32-bit timer/event counters with
two signaling modes and internal or external clocking. They can signal internally
to the ’C4x or externally to the outside world at specified intervals, or they
can count external events. Each timer has an I/O pin that can be used as an
input clock to the timer, as an output signal driven by the timer, or as a general purpose
I/O pin.