LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that...

18
Lab four in SMD077 lp2 2001 Introduction to a low power Digital Signal Processor LAB FOUR SMD077 lp2 – 2001

Transcript of LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that...

Page 1: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

Lab four in SMD077 lp2 2001

Introduction to a low power Digital Signal Processor

LAB FOUR

SMD077

lp2 – 2001

Page 2: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

1

1 Objectives _____________________________________________________________2

2 Prelab ________________________________________________________________2

3 DSP assembly techniques_________________________________________________2

3.1 Dual MAC _______________________________________________________________ 3

3.2 Circular addressing _______________________________________________________ 4

3.3 Repeat instructions ________________________________________________________ 5

4 Assignment ____________________________________________________________6

4.1 Step one - Code Composer Studio____________________________________________ 7

4.2 Step two - Implementing the filter ___________________________________________ 13

4.3 Step three - questions _____________________________________________________ 15

4.4 Hints __________________________________________________________________ 15

5 Goal _________________________________________________________________15

6 Submission ___________________________________________________________16

7 References____________________________________________________________17

Page 3: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

2

1 Objectives The objective of this lab is to get an introduction to Texas Instruments TMSC55xx series digital signal processors and the environment for developing DSP applications. The DSP is the latest in its family and features a very low power consumption combined with reasonable performance. It is mainly targeted for mobile devices such as 3G phones and high bandwidth multimedia devices and different versions, such as the C5510, have starting “sampling”.

2 Prelab Before you begin with the lab it is important that you go through the below listed items. • Read through the section 3, DSP assembly techniques, and use the Code Composer tutorial given

with this assignment in section 4, Implementation. • Read through the documents that are listed in the reference chapter. • Read the hints at the end of this document

3 DSP assembly techniques We will present some examples using the assembly language to take advantage of typical DSP architecture. The examples will be divided into three parts, dual MAC, circular addressing and repeat instructions.

3.1 Dual MAC The TMSC55xx has its strength in the ability to use dual multiply and accumulate in a single cycle. The method is based on that you use the same coefficient for two preceding input values and since you have two MACs you work on two output values at a time.

The syntax in mnemonic assembly is given below. MAC is the instruction, AR2 points at the input samples, CDP points at the coefficient and the ACx are the accumulator.

Page 4: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

3

ARx are 16-bit address registers and the ‘*’ denotes “where register points at” and the ‘+’ means post increment the register by one after the memory access. Same goes for CDP which also is an address register but a bit special. All pointer registers resides in the A-unit (Address Data Flow Unit) which is responsible for address generation within the processor. When using a register as a pointer that is called indirect addressing. Since the architectural layout of the target DSP permits three parallel memory reads we use that to fetch two inputs and one coefficient at a time. One of these 16-bit data buses is only connected to the D-unit in the CPU which is the one containing the main computational unit. This bus is used to carry coefficients and addresses by the CDP (Coefficient Data Pointer) and the referenced memory must be in internal memory (DARAM or SARAM). AC0 and AC1 are accumulator registers which are 40-bits wide and resides in the D-unit (Data Computation Unit). They can be shifted, saturated and so on before the outputs that they hold are written to memory. The ‘::’ implies that both instruction can be run in parallel and this is valid only since CDP register has the same modifications in both MAC instructions.

3.2 Circular Addressing For each time we have finished working on the output values and want to restart the calculation on the next two, the coefficient pointer should be wrapped back. We could of course subtract the total number of filter coefficient used after each time but is there a way to do this automatically?

The solution is to setup the coefficient buffer as a circular buffer and this feature is supported for all address registers in hardware.

The addressable data space is 23-bit where each distinct address references a 16-bit word. This means that of the 23 bit address only 16-bits are taken from ARx or CDP and the remaining 7-bits are taken elsewhere.

Page 5: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

4

These 7-bits make up the base pointer for the data page and the next 16-bits the offset into that page. So a data page has the size of 65535 words, and you can have a total of 127 pages.

As one could imagine the ARx and CDP registers are bit subsets of XARx and XCDP. This means that the complete address would be described by, for an example, XCDP, with the highest 7-bits giving the page number or base address and CDP the 16-bit offset into that page. If we return to the issue of circular addressing, the code for the circular buffer setup for the coefficients is given below,

MOV #a0, XCDP ; Use Coefficient Data Pointer register for coefficient MOV #a0, BSAC ; Set the corresponding Buffer Start Address MOV #0, CDP ; Initialize the offset from the start address to ZERO MOV #4, BKC ; Example is using 4 taps so that is the block length BSET CDPLC ; Set the CDP register in circular mode

If we were to setup an ARx register in circular mode we had to pay attention to which XARx to use. Using AR1 in circular mode we will need to initialize XAR0, odd but true. The current configuration of an address register is depending of bits in a status register ST2_55 in the P-Unit (Program Flow Unit). The P-Unit is responsible for loops, code behavior and pipeline protection etc.

To set CDP in linear mode again use,

BCLR CDPLC

Page 6: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

5

3.3 Repeat instructions We start with the single repeat instruction. MOV #(numtaps-1), CSR RPT CSR MAC *AR2+, *CDP+, AC0

::MAC *AR3+, *CDP+, AC1 The RPT instruction will repeat a single instruction a maximum of 64k times since its corresponding register is 16-bit. You could also supply the number of times as a constant. Since the MACs can be executed in parallel they are viewed as one instruction and is therefore repeatable. If you want to loop a block of code you can either use RPTB or RPTBLocal. To use them you first have to setup the corresponding BRCx register (Block Repeat Counter), then decide if your block of could would fit within 56 bytes.

MOV #outer_cnt, BRC0 ; load outer loop count MOV #inner_cnt, BRC1 ;load BRC1, auto-load BRS1 RPTBLOCAL outer ;use BRC0

. . . RPTBLOCAL inner ;BRC1: decrements, BRS1-no change

. . . inner: last_inner . . .

outer: last outer One major enhancement over the predecessor C54x except for dual MACs is the ability shown above to next repeat block instructions with NO context save required. You can also insert a RPT instruction within the inner, outer or both nested loops. If you just want to use a single level block repeat loop count is put in BRC0.

Page 7: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

6

4 Assignment The assignment in this lab is to implement the same FIR filter equation that was described in lab one, but this time it will be implemented for a completely different architecture. Start by downloading lab4_2001.zip from the SMD077 homepage. Files included are, Source files • fir.c - FIR algorithm implementation in C. • fir_asm.asm - Example code in mnemonic C55x assembly.

- The example should be replaced with your implementation. • fir.h - Defines all callable fir functions. • defines.h - Various settings. • main.c - Contains the entry point for the DSP and connects to IO data.

Code Composer Studio (CCS) files • lab4_c5510.mem - Used for memory management and code linking.

Data files • fir_hp.dat - 33 taps highpass filter breaking just below 4kHz.

- Scaled to fixed point by a factor 2^7. TI Hex format. • 200_4000.dat - 8192 samples of 200Hz and 4kHz mixed signal.

- TI Hex format. Samples will easily fit within 16-bit signed word - representation.

• reference.dat - Output generated from included fir_c function over 4 blocks of samples, - using 200_4000.dat and fir_hp.dat.

TI Hex format = 16-bit 2's complement signed value written in 4 digit hex

MATLAB files • datmaker.m - Given a vector, write a TI .dat file • datreader.m - Returns a vector, given a TI .dat file • db_diff.m - Visualize the output .dat file for

- frequency analyze.

Look in the .m files for further descriptions

Page 8: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

7

4.1 Step one - Code Composer Studio You will start this assignment by creating a new project, but first start CCS 2.0. Start button -> Programs -> Texas Instruments -> Code Composer 2 (‘C5000) -> Code Composer Studio 1) Project->New

Leave the defaults and just supply a path and name, use the name lab4 for convenience further on.

2) Right click on the lab4 project in the menu tree on the left side and select Add files… or Project->Add files to Project…. DO NOT add the .h files, these are scanned for automatic and added if needed. * Start by adding the .c files, * then the .asm file and * last you need to add a run-time library for the specific target, in our case the TMSC55x. The file needed is called rts55.lib and is located at

C:\program\ti\c5500\cgtools\lib\rts55.lib

3) Tools->Linker Configuration If not already selected, select Visual Linker.

Page 9: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

8

4) Project->Rebuild All, or use the toolbar,

Notice that the .h files are now present in the Include folder. The compiler will complain about a missing .rcp file, double click on that message or go to File->New->Visual Linker to invoke the wizard. Press Next the two first steps. Then select the lab4_c5510.mem as target map file.

At the fourth step we recommend that you end the wizard by pressing Accept, then Finish.

Go to the project file tree once again and double click on the newly created lab4.rcp which will invoke the Visual Linker. A popup window will inform you about Not Yet Placed, press

Page 10: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

9

OK. Drag and drop the different sections in the Not Yet Placed folder to suitable blocks in the memory map. Since variables will benefit more by residing in the onchip dual

access ram (DARAM) we will put them there. The code can use single access ram (SARAM). A memory page is 64k words large and each block is 16k words and should easily fit In the memory requirements for this assignment.

Change the loadable .out file to lab4.out. This is also done within the Visual Linker and modified under Output Files. Right click on the a.out file and choose properties to change the path and name.

Close the Visual Linker window and save the changes. Rebuild the project once again, this time it should complete without errors and warnings,

ignore remarks if any. Instead of generating an executable .exe CCS will produce .out files. 5) File->Load Program

Select the generated lab4.out file and Open. Since the program is now loaded into the target, which for this assignment is a simulator, CCS can now use symbol information for debugging and profiling the source code. A popup window showing native assembly code tells you what has been sent to the target, you can close that window if you like. 6) Now attach IO capabilities to the project. Open main.c from the project tree on the left side and

locate the call to getTaps(). Make sure the cursor is present at that line and toggle the ProbePoint button, recall from the toolbar snapshot its location. Repeat that procedure for the calls to dataInput() and dataOutput(). Blue diamonds should occur at the left margin.

Page 11: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

10

Now connect those points in the code with external files selecting File->File I/O… and then add fir_hp.dat, 200_4000.dat and lab4_out..dat files. For each file: • Add the file , type should be *.dat (Hex) • Address should be set to the corresponding buffer, declared topmost as a global array in main.c. • Length set to the corresponding buffer size, defined in defines.h. • Connect a probe point. So that gives, • fir_hp.dat, 33, taps_buffer • 200_4000.dat, 128, input_buffer • lab4_out.dat, 128, output_buffer, the name choosen here is arbitrary. The output file must be added under the File Output tab.

Page 12: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

11

Now press the Add Probe Point button within the File IO window. Select a line number in the Probe Point field, then match it to a file in the Connect to combo box. Select replace. Repeat until all probe points are connected.

Press OK twice to exit the File I/O properly. Notice three small players which will be on top of each other, separate these and move them to place where they are not in the way. 5) Look at the buffers by selecting View->Graph->Time/Frequency... and changing some properties, remember that taps_buffer has only 33 values compared to 128 for input_buffer and output_buffer.

Page 13: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

12

6) Profile the code by selecting Profiler->Start New Session. Profiling means measuring performance, in our case cycles spent on a particular piece of code. Session name is arbitrary. Change tab in the profile window to functions and mark the fir_c function header in fir.c, then drag and drop the selection to the profile window. The profile window displays number of times the profiled area or function has been accessed, and various statistical measurements.

7) Run the code and simulate. This is normally done by pressing the "Running man" button on the left side or Debug->Run. We suggest that you choose Debug->Go Main, then step through the code using F10 and F8. If you get caught in a for loop just put a breakpoint at a line further down and press F5. Recall from the toolbar snapshot what the Toggle Breakpoint button looks like. When stepping through the code instead of free running you will more easily follow what happens. Hopefully all three graphs looks as they should: a 200/4000Hz signal, high pass filter and a ~4000Hz signal.

Page 14: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

13

Remember to rewind the data players before trying to use them for another run (at a minimum be sure to rewind the filter since the output is not limited unless specified and input has more samples in the file. If you ever forget this, just select Debug->Restart and click the players until you can rewind them and redo the simulation. If you want to do the simulation again use Debug->Restart or File->Reload Program. After any changes in the source you need to rebuild and reload the program.

4.2 Step two – Implementing the filter Implement the FIR filter algorithm using assembly instructions. The skeleton code in fir_asm.asm is written in mnemonic assembly. There also exist a somewhat higher level version of the language called algebraic assembly, but since the goal with this assignment is to give an insight of DSP architecture it is better to keep things native. You should put the code in the function fir_asm, that is between the label _fir_asm: and the RET call. We have implemented the fir_setup_asm that copies the filter taps to a buffer reachable from the assembler file and stores the number of filter taps. If you want to add any further “local” variables or buffers you could just follow the given pattern, or rewrite fir_asm.asm completely. The code given is just a bit of help to get started. The code given in fir_asm performs a simple example of moving data. First it copies the taps to the output block and filling up remaining block with input samples.

Page 15: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

14

One important thing to remember is to change the function pointers assignment in main.c to use both the fir_asm_setup and fir_asm. Since much of the code for implementing a block FIR filter could be found in reference documentation, we have added some further requirements to the implementation. These are the same as for the previous assignments, namely: • Implement a history function so continuous blocks of data can be processed. • The algorithm should work for an odd number of taps as well as odd block size. This also results in giving ambitious students more credit since implementation efforts can more easily be recognized. You can test your implementation further in matlab using the db_diff.m. This matlab script compares a user generated .dat file with the original and a reference file in the frequency domain. The plots are in dB ( decibel = 20*log10(x) ) and shows how well the filter has affected the spectrum. In the default case with the filter_hp.dat it attenuates the low frequency about 5-6 dB and should leave the high frequency part fairly intact (less 1 dB difference). As long as your generated file gives similar results As shown below your implementation should be correct.

Page 16: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

15

4.3 Step three - questions Answer these questions the best you can. Question 1) – In circular mode, what does the value contained in ARn or CDP signify? Question 2) – Which internal buses are used to read operands for a dual-MAC instructions? See Goal and Submission where to write the answers and submit the lab

4.4 Hints • Use comments when coding assembler, especially since the syntax is rather cryptic. • If you reset the CPU in Code Composer DO NOT run the simulation without loading a new

program, it will “fet” hang and you will need to connect all those Probe Points once again.

5 Goal In order to get this lab accepted you must 1. Develop a code that implements the filter algorithm using the mnemonic assembly. The code

should be put in between _fir_asm: and RET in file fir_asm.asm. This implementation has to be functionally correct, include a history capability, work for even or odd number of taps and/or block sizes, and the error requirements have to be met.

2. Write a text-file, called lab4.txt which has your names (see below). Then it should contain timing information of your assembler function. This text file should also contain a description of what you have done (e.g. optimization techniques) and why.

3. Answers to the questions in section 4.3 (step three). Put these answers in lab4.txt (see below). Lab4.txt format ==================================================================== Firstname1 Lastname1 Firstname2 Lastname2 ==================================================================== Timing information of fir_asm() ==================================================================== Answer to questions 1) 2) ==================================================================== Optimization description ====================================================================

Page 17: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

16

6 Submission For submission of the lab, write an email with the format shown below. Attach the files fir_asm.asm and lab4.txt. Send the email to: [email protected] Note! For due date refer to the lab page. Make sure to submit the lab before this deadline for full credit. Email format Subject: lab4 smd077 Attachments: fir_asm.asm, lab4.txt Body of mail: Firstname1 Lastname1 email address Firstname2 Lastname2 email address This is how we will test and verify your lab We will randomly choose a number of taps for the filter and a number of samples for the block size. These values will be used when correcting and benchmarking all groups. If everything is ok - you have passed lab four and will receive reply by mail. If not passed you will have one week from the date you receive a reply (by mail) to correct and re-submit your lab.

Page 18: LAB FOUR SMD077 lp2 – 2001 FOUR SMD077 lp2 – 2001 . 1 ... • Read through the documents that are listed in the ... Instead of generating an executable .exe CCS will produce .out

17

7 References [1] In Code Composer Studio, select Help->User Manuals, or locate the files in C:\program\ti\docs\pdf\ Documentation of special interest is,

TMS320C55x Manuals Use this document: If you need information about: SPRU517 C55x Instruction Set Simulator User's

Guide This document is the user’s guide for the TMS320C55x™ instruction set simulator, available within Code Composer Studio. This document describes the basic capabilities of the simulator and the features provided for configuring the it.

SPRU509 Code Composer Studio Getting Started Guide

Provides basic procedures in program development flow order to help you begin programming Code Composer Studio.

SPRU280 TMS320C55x Assembly Language Tools User's Guide

Assembly language tools designed for the TMS320C55x devices. - Assembler - Archiver - Linker - Absolute lister - Cross-reference utility - Hex-conversion utility

SPRU371 TMS320C55x DSP CPU Reference Guide

The architecture, registers, and operation of the CPU for TMS320C55x devices.

SPRU374 TMS320C55x DSP Mnemonic Instruction

Set Reference Guide The mnemonic instructions. This document also includes a summary of the instruction set, a list of the instruction opcodes, and a cross-reference to the algebraic instruction set.

SPRU376 TMS320C55x DSP Programmer's Guide Tips for programming the TMS320C55x

devices. A revision of SPRU376 is scheduled to be released in mid-year 2001 (see the TI Web site).

SPRU281 TMS320C55x Optimizing C/C++ Compiler User's Guide

Compiler tools designed for the TMS320C55x devices. - C/C++ compiler - Source interlist utility - Library-build utility

SPRU393 TMS320C55x Technical Overview The CPU architecture, low-power enhancements, and embedded emulation features of the TMS320C55x devices.

[2] http://www.ti.com