TigerSHARC CLU Closer look at the XCORRS
description
Transcript of TigerSHARC CLU Closer look at the XCORRS
The practice
Suppose we have the vector – in-phase and out-of-phase data gathered over an antenna from a satellite for example. Gain issues make it x16
-16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j, -16-16j 16+16j, 16+16j, etc
Question – if the original data from the satellite had this form -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j,
How is the satellite data delayed? FOR THIS EXAMPLE …….. 0, 3, 6, 9, 12 etc
Tackle the issue with FIR
First – modify correlation function to handle complex values Ignore that issue at the moment
Imagine 1024 data points + 1024 PRN Need to do 1024 FIR each of 1024 taps We know how to optimize to do 2 taps every cycle (one
in X and one in Y) Cycle time is 1024 * 512 cycles = 1 ms at 500 MHz
XCORS can do 8 * 16 taps each cycle in each compute block – 148 times faster
Where does the CLU fit in?
XCORRS definition
THEORYMathematicaldefinition
Uses registers
TRDC
And something calledCUT
Satellite data
Quad fetch brings in8 complex values 8 bits eachPattern here is -1 + 0j, 1 + 0j, 1 + 0j, -1 + 0j, 1 + 0j, 1 + 0j, ……….
PRN code – 2 bit complex number
Seems strange to have two dummy bitsBut actually makes sense
PRN -1+ -1j, 1 + j, 1 + j, -1 + -1j, 1 + j, 1 + j, ……….
+1, -1 are associated with the PSK – more next lecture
Problem BINARY means 1 and 0, so how represent 1 and -1
PRN
PRN
0x3 value go in asC15 and C160011 -- C15 = -1 –j C16 = +1 + j
Loading the THR registers
Standard XCORRS instruction
Lower 46 bits ofTHR1:0
R7:3
TR0, TR1, TR2 ……. TR15
TR15:0 = XCORRS(R7:4, THR3:0)
TR0 += D7 * C22 + D6 * C21 +… 8 tapsTR1 += D7 * C21 + D6 * C20 +… 8 taps………..………..TR15 += D7 * C7 + D6 * C6 + … 8 taps
64 taps each cycles – on both x and y compute blocks – if set up properly
128 taps each cycle – these are “complex taps”compared to 2 real taps / cycle after lab. 3
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7)
TR0 += D7 * C22 + D6 * C21 + … 8 tapsTR1 += D7 * C21 + D6 * C20 + … 8 taps………..………..TR14 += D7 * C8 + D6 * C7 2 tapsTR15 += D7 * C7 1 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15)
TR0 += D7 * C22 + D6 * C21 … 8 tapsTR1 += D7 * C21 + D6 * C20 … 7 taps………..TR7 += D7 * C15 … 1 tapsTR0 += 0 … 0 taps
………..TR15 += 0 … 0 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15)
TR0 += 0 … 0 tapsTR1 += D0 *C14 1 taps………..TR7 += D6 * C14 + D5 * C13 + … 7 tapsTR0 += D7 * C14 + D6 * C13 + … 8 taps
………..TR15 += D7 * C7 + D6 * C7 + … 8 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15)
TR0 += D7 * C22 + D6 * C21 … 8 tapsTR1 += D7 * C21 + D6 * C20 … 7 taps………..TR7 += D7 * C15 … 1 tapsTR0 += 0 … 0 taps
………..TR15 += 0 … 0 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7)
TR0 += D7 * C22 + D6 * C21 + … 8 tapsTR1 += D7 * C21 + D6 * C20 + … 8 taps………..………..TR14 += D7 * C8 + D6 * C7 2 tapsTR15 += D7 * C7 1 taps
TR15:0 = XCORRS(R7:4, THR3:0)
TR0 += D7 * C22 + D6 * C21 +… 8 tapsTR1 += D7 * C21 + D6 * C20 +… 8 taps………..………..TR15 += D7 * C7 + D6 * C6 + … 8 taps
64 taps each cycles – on both x and y compute blocks – if set up properly
128 taps each cycle – these are “complex taps”compared to 2 real taps / cycle after lab. 3
Problem at this point -- THR3:2 emptyNeed to bring in more PRN values
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15)
TR0 += 0 … 0 tapsTR1 += D0 *C14 1 taps………..TR7 += D6 * C14 + D5 * C13 + … 7 tapsTR0 += D7 * C14 + D6 * C13 + … 8 taps
………..TR15 += D7 * C7 + D6 * C7 + … 8 taps
Final Result
Maximum correlation occurs every 3 shifts – which is what we expectIs it the correct results
Correlation – result expected
In step-1 +0j, 1 + 0j, 1 + 0j, … 16 times
with-1 - j, 1 + j, 1 + j, … 16 times
-1 * -1 + 1 * 1 + 1 * 1 + 48 = 0x30 -- Real component
Out of step-1 +0j, 1 + 0j, 1 + 0j, … 16 times
with1 + j, 1 + j, -1 - j, … 16 times
-1 * 1 + 1 * 1 + 1 * -1 + -16 = -0x10 = 0xFFF0
Final Result
1) Now have correlation values for 16 shifts in TR registers – store to external memoryRepeat for all other necessary shifts – find the maximum2) Now make parallel in SISD mode 3) Now make parallel in SIMD
Take home Quiz 4
Old requirement
Do Lab 4 with FFT and XCORRS
Write tests and demonstrate XCORRS used for correlation
a) Not parallel instruction format – but in a loopb) Now do in optimized SISD modec) Now do in optimized SIMD mode