Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
-
Upload
zain-curvin -
Category
Documents
-
view
226 -
download
3
Transcript of Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
Details Details .L.L and and .S.S unitsunitsTMS320TMS320C6000C6000
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
2
Let us have a look at Let us have a look at the final details the final details concerning the concerning the
functional units.functional units.
Consider first the case Consider first the case of the of the .L.L and and .S.S units. units.
Details Details .L.L and and .S.S units units
3
So where do the 40-bit registers come from?So where do the 40-bit registers come from?
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
Operands can be:Operands can be: 5-bit constants5-bit constants (or (or 16bit for MVKL and 16bit for MVKL and
MVKHMVKH).). 32-bit registers32-bit registers.. 40-bit Registers40-bit Registers..
However, we have seen that registers are However, we have seen that registers are only 32bit.only 32bit.
4
A 40-bit register can be obtained by A 40-bit register can be obtained by concatenatingconcatenating two registers. two registers.
However, there are 3 conditions However, there are 3 conditions that need to be respected:that need to be respected: The registers must be from the The registers must be from the same same
sideside..
The The firstfirst register register must be evenmust be even and the and the second oddsecond odd..
The registers must be The registers must be consecutiveconsecutive..
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
5
A1:A0A1:A0
A3:A2A3:A2
A5:A4A5:A4
A7:A6A7:A6
A9:A8A9:A8
A11:A10A11:A10
A13:A12A13:A12
A15:A14A15:A14
odd odd eveneven::323288
40-bit Reg40-bit Reg
B1:B0B1:B0
B3:B2B3:B2
B5:B4B5:B4
B7:B6B7:B6
B9:B8B9:B8
B11:B10B11:B10
B13:B12B13:B12
B15:B14B15:B14
odd odd eveneven::323288
40-bit Reg40-bit Reg
All combinations of 40-bit registers are All combinations of 40-bit registers are shown below:shown below:
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
632-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or or .S.S
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
7
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
8
OR .L1 A0, A1, A2OR .L1 A0, A1, A2
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
9
OR .L1 A0, A1, A2OR .L1 A0, A1, A2
ADD .L2 -5, B3, B4ADD .L2 -5, B3, B4
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
10
OR .L1 A0, A1, A2OR .L1 A0, A1, A2
ADD .L2 -5, B3, B4ADD .L2 -5, B3, B4
ADD .L1 A2, A3, A5:A4ADD .L1 A2, A3, A5:A4
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
11
OR.L1 A0, A1, A2OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4
ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4
SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
12
OR.L1 A0, A1, A2OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4
ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4
SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4
ADD.L2 3, B9:B8, B9:B8ADD.L2 3, B9:B8, B9:B8
instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>
32-bit32-bitRegReg
40-bit40-bitRegReg
< src >< src > < src >< src >
32-bit32-bitRegReg
5-bit5-bitConstConst
32-bit32-bitRegReg
40-bit40-bitRegReg
< dst >< dst >
.L.L or .S or .S
OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant
13
To move the content of a register (A or To move the content of a register (A or B) to another register (B or A) use the B) to another register (B or A) use the move move MVMV Instruction, e.g.: Instruction, e.g.:
MV MV A0, B0A0, B0
MV MV B6, B7B6, B7
To move the content of a To move the content of a control registercontrol register to another register (A or B) or vice-versa to another register (A or B) or vice-versa use the use the MVCMVC instruction, e.g.instruction, e.g.::
MVC MVC IFR, A0IFR, A0
MVC MVC A0, IRPA0, IRP
Register to registerRegister to register data data transfertransfer
IncreasingIncreasing thethe
processing processing powerpower
TMS320TMS320C6000C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
15
Y =Y =4040 aann x xnn
n = 1n = 1**
Code ReviewCode Review(using side A only)(using side A only)
MVKMVK .S1.S1 40, A240, A2 ; A2 = 40, loop count; A2 = 40, loop count
loop:loop: LDHLDH .D1.D1 *A5++, A0*A5++, A0 ; A0 = a(n); A0 = a(n)
LDHLDH .D1.D1 *A6++, A1*A6++, A1 ; A1 = x(n); A1 = x(n)
MPYMPY .M1.M1 A0, A1, A3A0, A1, A3 ; A3 = a(n) * x(n); A3 = a(n) * x(n)
ADDADD .L1.L1 A4, A3, A4A4, A3, A4 ; Y = Y + A3; Y = Y + A3
SUBSUB .L1.L1 A2A2, 1, , 1, A2A2 ; decrement loop count; decrement loop count
[[A2A2]] BB .S1.S1 looploop ; if A2 ; if A2 0, branch 0, branch
STHSTH .D1.D1 A4, *A7A4, *A7 ; *A7 = Y; *A7 = Y
Note: Assume that Note: Assume that A4A4 was previously cleared and the was previously cleared and the pointerspointers are initialised. are initialised.
Assume thatAssume that A2A2 is is B0B0
16
How can we How can we add more add more processing processing
power to this power to this processor?processor?
.S1.S1.S1.S1
.M1.M1.M1.M1
.L1.L1.L1.L1
.D1.D1.D1.D1
A0A0A1A1A2A2A3A3A4A4
Register File ARegister File A
..
..
..
Data MemoryData Memory
A15A15
32-bits32-bits
Increasing the processing Increasing the processing power!power!
17
(1)(1)Increase the Increase the clock clock frequency.frequency.
Increasing the processing Increasing the processing power!power!
(2)(2)Increase the Increase the number of number of Processing Processing unitsunits..
.S1.S1.S1.S1
.M1.M1.M1.M1
.L1.L1.L1.L1
.D1.D1.D1.D1
A0A0A1A1A2A2A3A3A4A4
Register File ARegister File A
..
..
..
Data MemoryData Memory
A15A15
32-bits32-bits
18
To increase the Processing Power, this To increase the Processing Power, this processor has processor has two sidestwo sides (A and B or 1 and (A and B or 1 and
2)2)
Data MemoryData Memory
.S.S11.S.S11
.M.M11.M.M11
.L.L11.L.L11
.D.D11.D.D11
AA00AA11AA22AA33AA44
Register Register File AFile A
..
..
..
AA1515
32-bits32-bits
.S.S22.S.S22
.M.M22.M.M22
.L.L22.L.L22
.D.D22.D.D22
BB00BB11BB22BB33BB44
Register Register File BFile B
..
..
..
BB1515
32-bits32-bits
19Data MemoryData Memory
.S1.S1.S1.S1
.M1.M1.M1.M1
.L1.L1.L1.L1
.D1.D1.D1.D1
A0A0A1A1A2A2A3A3A4A4
Register File ARegister File A
..
..
..
A15A15
32-bits32-bits
.S2.S2.S2.S2
.M2.M2.M2.M2
.L2.L2.L2.L2
.D2.D2.D2.D2
B0B0B1B1B2B2B3B3B4B4
Register File BRegister File B
..
..
..
B15B15
32-bits32-bits
Can the two sides Can the two sides exchange exchange operands in order to increase operands in order to increase
performance?performance?
20
The answer is YES butThe answer is YES butthere are limitationsthere are limitations
To exchange operands between the two To exchange operands between the two sides, some sides, some cross paths cross paths or or links are links are requiredrequired..
What is a cross path?What is a cross path? A cross path links one side of the CPU A cross path links one side of the CPU
to the other.to the other. There areThere are two typestwo types of cross paths:of cross paths:
DataData cross paths.cross paths. AddressAddress cross paths.cross paths.
21
Data Data Cross PathsCross Paths
Data cross paths can also be referred to Data cross paths can also be referred to as register file cross paths.as register file cross paths.
These cross paths allow These cross paths allow operandsoperands from from one side to be used by the other side.one side to be used by the other side.
There are There are only two cross pathsonly two cross paths:: one path which conveys data one path which conveys data from side B from side B
to side Ato side A, , 1X1X.. one path which conveys data from one path which conveys data from side A side A
to side Bto side B, , 2X2X..
22
TMS320TMS320C67x C67x Data-Path Data-Path
Data cross paths only apply to the Data cross paths only apply to the .L.L, , .S.S and and .M.M units. units. The data cross paths are very useful, however there The data cross paths are very useful, however there
are some limitations in their use.are some limitations in their use.
23
DataData Cross Path - Cross Path - LimitationsLimitations
(1) The destination register must be on same side as unit.
(2) Source registers - up to one cross path per execute packet per side.
Execute packetExecute packet: group of instructions that : group of instructions that execute simultaneously.execute simultaneously.
AA
22xx
.L1.L1
.M1.M1
.S1.S1
BB
11xx
<src><src>
<src><src><dst><dst>
24
DataData Cross Path - Cross Path - LimitationsLimitations
AA
22xx
.L1.L1
.M1.M1
.S1.S1
BB
11xx
<src><src>
<src><src><dst><dst>
eg:ADD .L2x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8
|||| ADD .L1x A0,B0,A2
|||| Means that the SUB and ADD Means that the SUB and ADD belong to the same fetch packet, belong to the same fetch packet, therefore execute simultaneously.therefore execute simultaneously.
25
Data Data Cross Path - Cross Path - LimitationsLimitations
eg:ADD .L2x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8
|| ADD .L1x A0,B0,A2
NOT VALID!NOT VALID!
AA
22xx
.L1.L1
.M1.M1
.S1.S1
BB
11xx
<src><src>
<src><src><dst><dst>
26
Data Data Cross Paths for both sidesCross Paths for both sides
AA
22xx
.L1.L1
.M1.M1
.S1.S1
BB
11xx
<src><src>
<src><src><dst><dst>
.L2.L2
.M2.M2
.S2.S2
<dst><dst><src><src>
<src><src>
27
AddressAddress cross paths cross paths
.D1.D1AA
AddrAddr
DataData
LDW .D1T1 *LDW .D1T1 *AA0,0,AA55STW .D1T1 STW .D1T1 AA5,*5,*AA00LDW .D1T1 *LDW .D1T1 *AA0,0,AA55STW .D1T1 STW .D1T1 AA5,*5,*AA00
(1) The pointer must be on the same side of the unit.
28
Load or store Load or store to either to either sideside
.D1.D1AA
*A0*A0
BB
Data1Data1 A5A5
Data2Data2 B5B5
DA1 = T1DA1 = T1
DA2 = T2DA2 = T2LDW .D1T1 *A0,A5LDW .D1T1 *A0,A5LDW .DLDW .D11TT22 * *AA0,0,BB55LDW .D1T1 *A0,A5LDW .D1T1 *A0,A5LDW .DLDW .D11TT22 * *AA0,0,BB55
29
Standard Standard Parallel Parallel LoadsLoads
.D1.D1AA
A5A5
*A0*A0
BBB5B5
.D2.D2
Data1Data1
*B0*B0
LDW .DLDW .D11TT11 * *AA0,0,AA55|| LDW .D|| LDW .D22TT22 * *BB0,0,BB55 LDW .DLDW .D11TT11 * *AA0,0,AA55|| LDW .D|| LDW .D22TT22 * *BB0,0,BB55
DA1 = T1DA1 = T1
DA2 = T2DA2 = T2
30
ParallelParallel Load/Store using Load/Store using address cross pathsaddress cross paths
.D1.D1 AAA5A5
*A0*A0
BBB5B5
.D2.D2
Data1Data1
*B0*B0
LDW .DLDW .D11TT22 * *AA0,0,BB55|| STW .D|| STW .D22TT11 AA5,*5,*BB00 LDW .DLDW .D11TT22 * *AA0,0,BB55|| STW .D|| STW .D22TT11 AA5,*5,*BB00
DA1 = T1DA1 = T1
DA2 = T2DA2 = T2
31
Fill the blanks ... Does this Fill the blanks ... Does this work?work?
.D1.D1AA
*A0*A0
BB
.D2.D2
Data1Data1
*B0*B0
LDW .D1__ *LDW .D1__ *AA0,0,BB55|| STW .D2__ || STW .D2__ BB6,*6,*BB00 LDW .D1__ *LDW .D1__ *AA0,0,BB55|| STW .D2__ || STW .D2__ BB6,*6,*BB00
DA1 = T1DA1 = T1
DA2 = T2DA2 = T2
32
Not Allowed!Not Allowed! Parallel accesses: Parallel accesses: both cross or neither both cross or neither
crosscross
.D1.D1AA
*A0*A0
BBB5B5
B6B6
.D2.D2
Data1Data1
*B0*B0
LDW .D1LDW .D1T2T2 *A0,B5 *A0,B5|| STW .D2|| STW .D2T2T2 B6,*B0 B6,*B0 LDW .D1LDW .D1T2T2 *A0,B5 *A0,B5|| STW .D2|| STW .D2T2T2 B6,*B0 B6,*B0
DA2 = T2DA2 = T2
33
ConditionsConditions Don’t Use Cross Don’t Use Cross PathsPaths
If aIf a conditional registerconditional register comes comes from the opposite sidefrom the opposite side, it does , it does NOT NOT use a data or address cross-pathuse a data or address cross-path..
Examples:Examples:
[[BB2]2] ADD .L ADD .L11 A2,A0,A4 A2,A0,A4 [[AA1]1] LDW .D LDW .D22 *B0,B5 *B0,B5
34
‘‘C6x Data-PathC6x Data-Path - - SummarySummary
CPUCPURef GuideRef Guide
Full CPU DatapathFull CPU Datapath(Pg 2-2)(Pg 2-2)
‘‘C67x C67x
35
Cross Paths - Cross Paths - SummarySummary
DataData Destination register on same side as unit.Destination register on same side as unit. Source registers - up to one cross path per Source registers - up to one cross path per
execute packet per side.execute packet per side. Use “x” to indicate cross-path.Use “x” to indicate cross-path.
AddressAddress Pointer must be on same side as unit.Pointer must be on same side as unit. Data can be transferred to/from either side.Data can be transferred to/from either side. Parallel accesses: both cross or neither cross.Parallel accesses: both cross or neither cross.
ConditionalsConditionals Don’tDon’t Use Cross Paths. Use Cross Paths.