DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7....

19
DSP Optimization Page 1 DSP Optimization I. Purpose The goal of this lab is to demonstrate basic optimization techniques. This lab is conducted using an EVM board, and can also be used with the simulator in conjunction with the estimated cycle count. II. Project Files The following files are used in this lab: firMain.c intrinsicCFilters.c linker.cmd naturalCFilters.c test.h utilities.c

Transcript of DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7....

Page 1: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page1 

DSPOptimization

I.PurposeThe goal of this lab is to demonstrate basic optimization techniques. This lab is conducted using an EVM board, and can also be used with the simulator in conjunction with the estimated cycle count.

II.ProjectFilesThe following files are used in this lab:

firMain.c intrinsicCFilters.c linker.cmd naturalCFilters.c test.h utilities.c

Page 2: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page2 

Part1:DevicePreparations

Task1.1EVMConfiguration1. Verify that the EVM is to ‘no boot’ mode, as shown. 

Boot Mode DIP SW3

(Pin 1, 2, 3, 4) DIP SW4

(Pin 1, 2, 3, 4) DIP SW5

(Pin 1, 2, 3, 4) DIP SW6

(Pin 1, 2, 3, 4)

No boot (off, on, on, on) (on, on, on, on) (on, on, on, on) (on, on, on, on)

NOTE: Additional EVM switch settings are available at the Processors Wiki: http://processors.wiki.ti.com/index.php/TMDXEVM6678L_EVM_Hardware_Setup#Boot_Mode_Dip_Switch_Settings  

Task1.2:CreateaNewTargetinCCS

1. Launch CCS by double-clicking the CCS icon on the desktop of the laptop. NOTE: As CCS initializes, a pop-up window appears showing a default workspace. Replace the default workspace with the following path: C:\\ti\\workspace

2. Create a new target configuration:

a Select the CCS menu option View Target Configurations.

b Select User Defined.

c Right-click User Defined and select New Target Configuration.

2. Enter the name of the new target configuration in the File Name: text box.

a The File Name is based on the EVM model, <model>.ccxml For example, EVM6678LE.ccxml

b Check the box next to Use shared location.

c Leave the Location as the default value.

d Click Finish.

 

Page 3: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page3 

3. The .ccxml file opens in a GUI-based view with the Basic tab active. Under General Setup, choose the Blackhawk XDS560V2-USB System Trace Emulator from the Connection pull-down menu, as shown.

4. The Board or Device field identifies the TI processor device. To find the device used in this lab, enter 6678 where it says type filter test. Locate and check the box for TMS320C6678, as shown.

5. Under Save Configuration, click the Save button.

 

Page 4: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page4 

7. Select the Advanced tab at the bottom of the screen.

8. Select Core 0 on the target: TMS320C6678_0 IcePick_D subpath_0 C66x_0

9. Once selected, Cpu Properties allows you to set the CPU properties starting with choosing an initialization script.

a Click Browse… to locate the appropriate GEL file. The GEL file is part of the CCS installation path C:\ti\ccsv6\ccs_base\emulation\boards\evmc6678l\gel NOTE: The path may vary slightly depending on your system

b Select evmc6678l.gel and click Open.

c Click Save.

9. You can now close the Target Configuration window by clicking the X next to the file name (e.g., EVMC6688LE.ccxml)

 

Page 5: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page5 

Part2:BasicDSPOptimization

Task2.1:Create,BuildandRuntheProject

1. Open CCS if you have not done so already.

2. Open a New CCS Project window through the CCS menu: File New CCS Project. Note – Depend on the version of CCS, you may have a slightly different way to get to CCS Project

3. Make sure that Target is set to Generic C66xx Device (as needed, use the pull down menu shown to set it)

Page 6: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page6 

4. Enter Optimization as a Project Name.

5. Click the check box next to Use default location.

NOTE: The default location for your local settings will vary from the screenshot shown).

6. Under Project templates & examples, select Empty Project > Empty Project. NOTE: Choose the Empty Project without main.c)

7. Press Finish to create the new project.

8. The Optimization project is now added to the Project Explorer view. Click on the small arrow on the left of the project name to display the two sub-directories: Includes and Debug.

9. Right-click on the newly-created Optimization project, and click on Add Files…

10. Browse to C:\ti\_labs , select all the files in this directory, and click Open.

NOTE: The path may vary slightly depending on your system

11. When prompted how files should be imported into the project, leave it as the default Copy Files, and then click OK.

12. For the following steps, double-click on the imported source file to open the file in the edit view:

13. Examine the code in firMain.c to understand the functions that are being called:

a The generateData function generates the data sets to be operated on.

b Functions naturalCFilters and intrinsicCfilters execute filters on generated data. The former is implemented completely in C, while the latter utilizes the compiler intrinsic.

Page 7: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page7 

14. Set the properties for the Debug configuration. Right-click on the project. Select Properties.

a In the Properties dialogue window choose Build, click on the Environment tab, and click the Add…button to add the path for a variable:

b Name = PDK_ROOT

c Value = C:\ti\mcsdk_2\pdk_C6678_1_1_2_6. NOTE: The path may vary slightly depending on your system

d Check the Add to all configurations box and click OK.

 

Page 8: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page8 

e Choose C6000 Compiler Optimization and set/verify the following properties:

Optimization level = 0

Optimize for code size = 0

f Choose C6000 Compiler Debug Options and set/verify the following properties:

Debugging model = Full symbolic debug

g Choose C6000 Compiler Include Options. Under the Add dir to #include search path, add the following two paths. NOTE: To add a path, click on the green +/add icon.

${PDK_ROOT}/packages/ti/csl

${PDK_ROOT}/packages

15. Click the OK button to save the project properties and close the Properties window.

 

Page 9: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page9 

16. Right-click on the Optimization project and select Rebuild Project. A successful build will generate the following output on the console:

Task2.2:ConnecttotheEVM

1. Verify the following:

The hardware target is configured as described in Lab preparations.

The mezzanine card on the EVM is connected via the USB cable to the laptop.

The EVM is powered on and the red LED light is active. You may have to wait a few seconds after connecting the power for the red light to appear.

 

Page 10: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page10 

2. In the Target Configurations view, right click on the target that you defined earlier and launch it. NOTE: If the Target Configurations view is not open, select View tab, as shown.

.

3. Launch the target configuration as follows:

g Select the target configuration EVM6678LE.ccxml file.

h Right click and select Launch Selected Configuration

 

Page 11: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page11 

4. CCS changes the perspective to Debug perspective, as shown.

5. Select Core 0: C66x_0, then right-click and select Connect Target. The initial configuration of the device is displayed in the console, as shown.

 

Page 12: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page12 

Core 0: C66xx_0 is now in suspended mode, as shown:

Task2.3:LoadandRuntheProgram

1. Enable the clock by selecting the CCS menu option Run Clock Enable

2. Select Core 0 and load the .out file created earlier in the lab:

a Select the CCS menu option Run Load Load Program

b Click Browse project…

c Select optimization.out by unwrapping the OptimizationDebug folders.

d Click OK to load the application to the target (Core 0).

3. Run the application by selecting the CCS menu option Run Resume.

4. A successful run should produce a console output, as shown. Record the cycle time for both natural C and intrinsic C versions. NOTE: You may get slightly different timings.

NOTE: If the time shows zero, you have not enabled the clock (see above)

Page 13: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page13 

Task2.4:CompilerOptimization

1. Switch back to the CCS Edit perspective.

2. Next, set the properties for the Release configuration. This suppresses all debug features and enables the highest time optimization.

a Right-click on the Optimization project and select Build Configurations Set Active Release, as shown.

3. Right-click on the Optimization project. Select Properties.

Page 14: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page14 

a If you have not done so already, choose Build, click on the Environment tab, and click the Add… button to add the path variable with Name as PDK_ROOT and Value as C:\ti\mcsdk_2\pdk_C6678_1_1_2_6

NOTE: The path may vary slightly depending on your system

b Select C6000 Compiler Optimization and set/verify the following properties:

Optimization level = 3

b Choose C6000 Compiler Debug Options and ensure that:

Debugging model = Suppress all symbolic debug generation

c Choose C6000 Compiler Include Options. Under the Add dir to #include search path, add the following two paths:

${PDK_ROOT}/packages/ti/csl

${PDK_ROOT}/packages

.

4. Click the OK button to save the project properties and close the Properties window.

5. Right-click on the Optimization project and select Rebuild Project. A successful build will generate output on the console as shown:

6. Change back to CCS Debug perspective.

7. Enable the Clock by selecting the CCS menu option Run Clock Enable (if you have done this in the previous section of this lab, you can ignore this step).

8. Select Core 0 and load the .out file you just created.

a Select the CCS menu option Run Load Load Program

b Click Browse project…

c Select optimization.out by unwrapping the OptimizationRelease and click OK. NOTE: Be sure to use the .out from the Release folder, NOT the Debug folder.

Page 15: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page15 

d Click OK to load the application to the target (Core 0).

9. Run the application by selecting the CCS menu option Run Resume.

10. A successful run should produce a console output as shown below. Record the optimized cycle times for both natural C and intrinsic C versions:

     DONE      natural C code size   32768  time 228698  intrinsic C code size   32768  time 1282215  no error was found  !!!             DONE        

QUESTIONS: 

How much improvement is noted for the natural C code? _________________________

What about the intrinsic code? ______________________________________________

Do intrinsic functions better utilize the processor?

 

Page 16: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page16 

Task2.5:EnableSoftwarePipelining

1. Switch to the CCS Edit perspective.

2. In the CCS Project Explorer properties, go to Build C6000 Compiler Advanced Options Assembler Options

3. Check the box that says Keep the generated assembly language (.asm) file, as shown.

4. Click OK.

5. Rebuild the code.

6. Because you are building the release configuration for the Optimization project, the generated assembly file will be located within the Release directory. Open the file intrinsicCFilter.asm file and answer the following questions:

QUESTIONS:

Was the compiler able to schedule the software pipeline? _________________________

What are the general reasons that the compiler might not schedule the software pipeline? _______________________________________________________________________

_______________________________________________________________________

Hint: Think about cases that can cause randomness in the execution timing.

What reason can you see that the compiler might not be able to schedule the software pipeline? _______________________________________________________________________

_______________________________________________________________________

Hint: Think about the inline function.

 

Page 17: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page17 

7. Open intrisicCFilters.c and replace the regular function with the intrinsic function in all the loops. HINT: Look at the definition of the regular function and see what intrinsic it uses. NOTE: The code must be replaced in all four loops. Here is an example of one of the filters:

8. From the CCS menu, select click File > Save All.

9. Rebuild the code, then load and run. NOTE: If prompted, you can load the program automatically as soon as the build is complete.

10. The results will look like the following. Compare it to the previous results.

11. Look at the file intrinsicCFilter.asm. QUESTIONS:

Did the compiler schedule the software pipeline? ____________________________

Record the optimized project cycles time for natural C function and for intrinsic function with software pipeline. ________________________________________________________

 

Page 18: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page18 

Task2.6:AligntheData

1. In the intrinsicCFilter.c code, the data is read from the memory. The data itself is defined in firMain.c, as shown.

QUESTIONS:

What is the alignment of the input data? _____________________________________________

What is the alignment of the filter coefficients (in the stack)? _____________________________

Hint: Find the pragma that aligns the data. What other ways are there to align the data on a 64-bit boundary?

2. Change the intrinsicCFilters.c code to tell the compiler that the data is loaded from aligned memory. This is done by changing the memory intrinsic. The non-aligned memory access intrinsic is _memX where X is the type of memory access. The align memory access, _amemX intrinsic tells the compiler that the data is aligned on 64 bit)

3. Replace the  y = _mem8_f2(p_in++); instruction with the aligned instruction

y = _amem8_f2(p_in++); in all four filters

4. Rebuild the code, load, and run.

5. Record the optimized project cycle time for natural C function and for intrinsic function with software pipeline and aligned load.

Task2.7:EnabletheMUST_ITERATEPragma

1. Uncomment the lines in the intrinsicCFilters.c code with the pragmas that tell the compiler the minimum number of iterations and the divisor (again, in all four filters).

2. Rebuild the code, load, and run.

3. Record the optimized project cycle time for natural C function and for intrinsic function with software pipeline, aligned load, and MUST_ITERATE pragma.

Page 19: DSP Optimizationsoftware-dl.ti.com/public/hpmp/software/leveraging_dsp...DSP Optimization Page 4 7. Select the Advanced tab at the bottom of the screen.8. Select Core 0 on the target:

  

DSPOptimization Page19 

Task2.8:CacheConsiderations

1. In test.h, change the number of elements to 2K.

2. Record the cycle counts for each case.

Size Clock Value Cycle per element

Clock Value /number of elements

32K

2K

QUESTION:

Why the number of cycles per element is much lower with 2K elements?? ___________________________________________

Hint: Think about L1 cache utilization. L2 cache size is 32Kbytes. Floating point value (32-bit) is 4 bytes. Complex floating point value is 8Bytes

 

END OF OPTIMIZATION LAB 1