Ultra sound solution Impact of C++ DSP optimization techniques.
-
Upload
daisy-clarke -
Category
Documents
-
view
232 -
download
0
Transcript of Ultra sound solution Impact of C++ DSP optimization techniques.
Ultra sound solution
Impact of C++ DSP optimization techniques
Research Team discussion Ultra-sound probe (20 MHz) that sends out
signals into body that reflect off moving blood cells in (Artery? Vein?)
Ultra-sound frequency received is Doppler shifted compared to transmitted frequency Same as sound when ambulance goes by. Higher
if approaching, lower if receding They get the positive frequencies (towards)
on the left audio channel and negative frequencies (away) on the right audio channel.
04/21/23.ENCM515 – Ultrasound ProblemCopyright [email protected] 2 / 33
Picture looks like this
Note that the display loses all direction information Can I help them to output the maximum frequency?
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 3 / 33
Captured audio signal
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 4 / 33
Engineering Problems
Problem 5 – Different amplitudes common
Problem 6 – Why are funny dead spots not lining up in left and right channels? Handling stereo not mono signals
Incorrect labeling / misinterpreation
Problem 7 – How to remove dead-spots?
Max frequency – definition 1 Frequency
below which X% of the frequencies fall
Noisy signal for large thresholds
> 80%
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 5 / 33
After XPI Stage 2 Have a working algorithm concept Engineering problem 1 – Complex math (a + jb) on SHARC! Engineering Problem 2 – Define maximum frequency
zillions of blood cells – therefore distribution of frequencies Workable prototype – discuss more with customer
Engineering Problem 3 – SHARC D/A can’t handle DC signal Workable prototype – discuss more with customer
Engineering Problem 4 – Can SHARC handle all this in real-time?
Problem 5 – Is different amplitudes of input channels common? Yes
Problem 6 – Why are funny dead spots not lining up in left and right channels? Artifact – mislabeled and misinterpreted sampled
Problem 7 – How to remove dead-spots? – Discuss more with customer
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 6 / 33
ProcessBlockDONEOUTSIDEINTERRUPT
AVOIDS RACE
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 7 / 33
Real life problem -- Stereo
Minor changes to Audio Premptive Task
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 8 / 33
Make “C – code more general Moved buffer[ ] to external files Unknown size of arrays being
processed
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 9 / 33
Switch to Release mode Switch to optimizing compiler
(ReleaseNWC) means can no longer set breakpoints – Fix with these steps
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 10 / 33
First look at code
Timing -- software loop with r2 as loop counter – test at end
N * (10 – 1) cycles (jump is not db)
-1 for 1parallel instruction
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 11 / 33
UseCompilerInfo button
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 12 / 33
3 Stalls – 2 on software jump. 1 on ?
Obvious things to do We are already processing left and
right channels in one program Switch to left audio in dm memory and
right audio in pm memory
Need to do Make right buffers ‘pm’ Change prototype of function to padd pm
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 13 / 33
As expected 2 cycles saved
Parallel dm and pm reads and writes
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 14 / 33
Why software loop? Switch does know what to do about
size of loop so can’t oprtimize loop
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 15 / 33
THIS PRAGMAIS A CONTRACTBETWEEN THEDEVELOPER AND COMPILEDON’T LIE
This does not compile
Pragma variables not handled by preprocessor
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 16 / 33
Variable as end of loop Compile will not optimizewhen loop parameter is declared external, or internal or static
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 17 / 33
Loop parameters all constantsknown to compiler
Drop from 8 cycles to2 cycles as compiler knows enough to switch to hardware loop control – STALLS FROM JUMP GONE
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 18 / 33
Where am I getting all my info?
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 19 / 33
Can we switch to SIMD mode
VECTORIZATION
MAY NOT BE POSSIBLE IF COMPILER DOES NOT KNOW ABOUT ALIGNMENT OF ARRAYS
(How arrays placed in memory)
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 20 / 33
Impact of vectorization Before -- loop count was 0x80 With memory operations of the form
r2 = dm(i4, m6) where m6 = 1 meaning code is doing r2 = i4+
+;
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 21 / 33
New instructions – SIMD mode
Bit set mode1 0x200000 (bit clr mode 1)
Processor doing r2 = dm(i5, 2)
Same as r2 = dm(i5, 1) AND s2 = dm(i5, 1)
Loading two registers
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 22 / 33
Try using #pragma inline BEFORE AFTER (20 cycles
faster?)
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 23 / 33
C++ showing out of order execution
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 24 / 33
WARNING
Lets do “inline” ProcessOneBlock( ) is called by four
subroutines – lets in
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 25 / 33
Mixed mode view is interesting
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 26 / 33
Mixed Mode Out of order execution with 4 copies of the code for
DoCopyBlock( ) (one for each of Process 0, Process1, Process2, Process 3)
NO CODE OF ProcessOneBlock( )
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 27 / 33
Speed improvement Moving from software loop and using dm and pm
memories caused a change from 8 cycles / pt to 2 cycles for two points processed in SIMD (4 CALLS * 7 CYCLES SAVED * N POINTS PROCESSED)
Moving to IN_LINE causes a change of around 120 cycles for each subroutine call (4 CALLS * 120 CYCLES SAVED)
N = 128 -- (4 * 1800 to 4 * 120) 480 Mhz processor -- 15 us to 1 us LESSON LEARNT – SPEND YOUR TIME OPTIMIZING
THE LOOPS – REST IS SMALLER AND GETS SMALLER WITH LARGER N
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 28 / 33
Otherimprovementsdepend oncode Characteristicsspecifics
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 29 / 33
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 30 / 33
Profile guided optimization
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 31 / 33
Memory alignment can be important
After first char fetch, system and move to move 8 chars in SIMD
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 32 / 33
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 33 / 33
Conditional code (manual PGO)
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 34 / 33
Correct ways to process loops
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 35 / 33
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 36 / 33
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 37 / 33
#pragma all_aligned #pragma loop_unroll N #pragma SIMD_for #pragma align num #pragma alignment_region( and
#pragma alignment_region_end
04/21/23ENCM515 – Ultrasound ProblemCopyright [email protected] 38 / 33