Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

35
Details Details .L .L and and .S .S units units TMS320 TMS320 C6000 C6000 m Dahnoun, Bristol University, (c) Texas Instruments 2004 m Dahnoun, Bristol University, (c) Texas Instruments 2004

Transcript of Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

Page 1: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

Details Details .L.L and and .S.S unitsunitsTMS320TMS320C6000C6000

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Page 2: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

2

Let us have a look at Let us have a look at the final details the final details concerning the concerning the

functional units.functional units.

Consider first the case Consider first the case of the of the .L.L and and .S.S units. units.

Details Details .L.L and and .S.S units units

Page 3: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

3

So where do the 40-bit registers come from?So where do the 40-bit registers come from?

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Operands can be:Operands can be: 5-bit constants5-bit constants (or (or 16bit for MVKL and 16bit for MVKL and

MVKHMVKH).). 32-bit registers32-bit registers.. 40-bit Registers40-bit Registers..

However, we have seen that registers are However, we have seen that registers are only 32bit.only 32bit.

Page 4: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

4

A 40-bit register can be obtained by A 40-bit register can be obtained by concatenatingconcatenating two registers. two registers.

However, there are 3 conditions However, there are 3 conditions that need to be respected:that need to be respected: The registers must be from the The registers must be from the same same

sideside..

The The firstfirst register register must be evenmust be even and the and the second oddsecond odd..

The registers must be The registers must be consecutiveconsecutive..

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 5: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

5

A1:A0A1:A0

A3:A2A3:A2

A5:A4A5:A4

A7:A6A7:A6

A9:A8A9:A8

A11:A10A11:A10

A13:A12A13:A12

A15:A14A15:A14

odd odd eveneven::323288

40-bit Reg40-bit Reg

B1:B0B1:B0

B3:B2B3:B2

B5:B4B5:B4

B7:B6B7:B6

B9:B8B9:B8

B11:B10B11:B10

B13:B12B13:B12

B15:B14B15:B14

odd odd eveneven::323288

40-bit Reg40-bit Reg

All combinations of 40-bit registers are All combinations of 40-bit registers are shown below:shown below:

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 6: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

632-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or or .S.S

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 7: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

7

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 8: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

8

OR .L1 A0, A1, A2OR .L1 A0, A1, A2

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 9: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

9

OR .L1 A0, A1, A2OR .L1 A0, A1, A2

ADD .L2 -5, B3, B4ADD .L2 -5, B3, B4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 10: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

10

OR .L1 A0, A1, A2OR .L1 A0, A1, A2

ADD .L2 -5, B3, B4ADD .L2 -5, B3, B4

ADD .L1 A2, A3, A5:A4ADD .L1 A2, A3, A5:A4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 11: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

11

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4

SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 12: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

12

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4

SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4

ADD.L2 3, B9:B8, B9:B8ADD.L2 3, B9:B8, B9:B8

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L.L or .S or .S

OperandsOperands 32/40-bits Register, 5-bits Constant32/40-bits Register, 5-bits Constant

Page 13: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

13

To move the content of a register (A or To move the content of a register (A or B) to another register (B or A) use the B) to another register (B or A) use the move move MVMV Instruction, e.g.: Instruction, e.g.:

MV MV A0, B0A0, B0

MV MV B6, B7B6, B7

To move the content of a To move the content of a control registercontrol register to another register (A or B) or vice-versa to another register (A or B) or vice-versa use the use the MVCMVC instruction, e.g.instruction, e.g.::

MVC MVC IFR, A0IFR, A0

MVC MVC A0, IRPA0, IRP

Register to registerRegister to register data data transfertransfer

Page 14: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

IncreasingIncreasing thethe

processing processing powerpower

TMS320TMS320C6000C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Page 15: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

15

Y =Y =4040 aann x xnn

n = 1n = 1**

Code ReviewCode Review(using side A only)(using side A only)

MVKMVK .S1.S1 40, A240, A2 ; A2 = 40, loop count; A2 = 40, loop count

loop:loop: LDHLDH .D1.D1 *A5++, A0*A5++, A0 ; A0 = a(n); A0 = a(n)

LDHLDH .D1.D1 *A6++, A1*A6++, A1 ; A1 = x(n); A1 = x(n)

MPYMPY .M1.M1 A0, A1, A3A0, A1, A3 ; A3 = a(n) * x(n); A3 = a(n) * x(n)

ADDADD .L1.L1 A4, A3, A4A4, A3, A4 ; Y = Y + A3; Y = Y + A3

SUBSUB .L1.L1 A2A2, 1, , 1, A2A2 ; decrement loop count; decrement loop count

[[A2A2]] BB .S1.S1 looploop ; if A2 ; if A2 0, branch 0, branch

STHSTH .D1.D1 A4, *A7A4, *A7 ; *A7 = Y; *A7 = Y

Note: Assume that Note: Assume that A4A4 was previously cleared and the was previously cleared and the pointerspointers are initialised. are initialised.

Assume thatAssume that A2A2 is is B0B0

Page 16: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

16

How can we How can we add more add more processing processing

power to this power to this processor?processor?

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Increasing the processing Increasing the processing power!power!

Page 17: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

17

(1)(1)Increase the Increase the clock clock frequency.frequency.

Increasing the processing Increasing the processing power!power!

(2)(2)Increase the Increase the number of number of Processing Processing unitsunits..

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Page 18: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

18

To increase the Processing Power, this To increase the Processing Power, this processor has processor has two sidestwo sides (A and B or 1 and (A and B or 1 and

2)2)

Data MemoryData Memory

.S.S11.S.S11

.M.M11.M.M11

.L.L11.L.L11

.D.D11.D.D11

AA00AA11AA22AA33AA44

Register Register File AFile A

..

..

..

AA1515

32-bits32-bits

.S.S22.S.S22

.M.M22.M.M22

.L.L22.L.L22

.D.D22.D.D22

BB00BB11BB22BB33BB44

Register Register File BFile B

..

..

..

BB1515

32-bits32-bits

Page 19: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

19Data MemoryData Memory

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

A15A15

32-bits32-bits

.S2.S2.S2.S2

.M2.M2.M2.M2

.L2.L2.L2.L2

.D2.D2.D2.D2

B0B0B1B1B2B2B3B3B4B4

Register File BRegister File B

..

..

..

B15B15

32-bits32-bits

Can the two sides Can the two sides exchange exchange operands in order to increase operands in order to increase

performance?performance?

Page 20: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

20

The answer is YES butThe answer is YES butthere are limitationsthere are limitations

To exchange operands between the two To exchange operands between the two sides, some sides, some cross paths cross paths or or links are links are requiredrequired..

What is a cross path?What is a cross path? A cross path links one side of the CPU A cross path links one side of the CPU

to the other.to the other. There areThere are two typestwo types of cross paths:of cross paths:

DataData cross paths.cross paths. AddressAddress cross paths.cross paths.

Page 21: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

21

Data Data Cross PathsCross Paths

Data cross paths can also be referred to Data cross paths can also be referred to as register file cross paths.as register file cross paths.

These cross paths allow These cross paths allow operandsoperands from from one side to be used by the other side.one side to be used by the other side.

There are There are only two cross pathsonly two cross paths:: one path which conveys data one path which conveys data from side B from side B

to side Ato side A, , 1X1X.. one path which conveys data from one path which conveys data from side A side A

to side Bto side B, , 2X2X..

Page 22: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

22

TMS320TMS320C67x C67x Data-Path Data-Path

Data cross paths only apply to the Data cross paths only apply to the .L.L, , .S.S and and .M.M units. units. The data cross paths are very useful, however there The data cross paths are very useful, however there

are some limitations in their use.are some limitations in their use.

Page 23: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

23

DataData Cross Path - Cross Path - LimitationsLimitations

(1) The destination register must be on same side as unit.

(2) Source registers - up to one cross path per execute packet per side.

Execute packetExecute packet: group of instructions that : group of instructions that execute simultaneously.execute simultaneously.

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

Page 24: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

24

DataData Cross Path - Cross Path - LimitationsLimitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

eg:ADD .L2x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|||| ADD .L1x A0,B0,A2

|||| Means that the SUB and ADD Means that the SUB and ADD belong to the same fetch packet, belong to the same fetch packet, therefore execute simultaneously.therefore execute simultaneously.

Page 25: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

25

Data Data Cross Path - Cross Path - LimitationsLimitations

eg:ADD .L2x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|| ADD .L1x A0,B0,A2

NOT VALID!NOT VALID!

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

Page 26: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

26

Data Data Cross Paths for both sidesCross Paths for both sides

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

.L2.L2

.M2.M2

.S2.S2

<dst><dst><src><src>

<src><src>

Page 27: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

27

AddressAddress cross paths cross paths

.D1.D1AA

AddrAddr

DataData

LDW .D1T1 *LDW .D1T1 *AA0,0,AA55STW .D1T1 STW .D1T1 AA5,*5,*AA00LDW .D1T1 *LDW .D1T1 *AA0,0,AA55STW .D1T1 STW .D1T1 AA5,*5,*AA00

(1) The pointer must be on the same side of the unit.

Page 28: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

28

Load or store Load or store to either to either sideside

.D1.D1AA

*A0*A0

BB

Data1Data1 A5A5

Data2Data2 B5B5

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2LDW .D1T1 *A0,A5LDW .D1T1 *A0,A5LDW .DLDW .D11TT22 * *AA0,0,BB55LDW .D1T1 *A0,A5LDW .D1T1 *A0,A5LDW .DLDW .D11TT22 * *AA0,0,BB55

Page 29: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

29

Standard Standard Parallel Parallel LoadsLoads

.D1.D1AA

A5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW .DLDW .D11TT11 * *AA0,0,AA55|| LDW .D|| LDW .D22TT22 * *BB0,0,BB55 LDW .DLDW .D11TT11 * *AA0,0,AA55|| LDW .D|| LDW .D22TT22 * *BB0,0,BB55

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 30: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

30

ParallelParallel Load/Store using Load/Store using address cross pathsaddress cross paths

.D1.D1 AAA5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW .DLDW .D11TT22 * *AA0,0,BB55|| STW .D|| STW .D22TT11 AA5,*5,*BB00 LDW .DLDW .D11TT22 * *AA0,0,BB55|| STW .D|| STW .D22TT11 AA5,*5,*BB00

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 31: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

31

Fill the blanks ... Does this Fill the blanks ... Does this work?work?

.D1.D1AA

*A0*A0

BB

.D2.D2

Data1Data1

*B0*B0

LDW .D1__ *LDW .D1__ *AA0,0,BB55|| STW .D2__ || STW .D2__ BB6,*6,*BB00 LDW .D1__ *LDW .D1__ *AA0,0,BB55|| STW .D2__ || STW .D2__ BB6,*6,*BB00

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 32: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

32

Not Allowed!Not Allowed! Parallel accesses: Parallel accesses: both cross or neither both cross or neither

crosscross

.D1.D1AA

*A0*A0

BBB5B5

B6B6

.D2.D2

Data1Data1

*B0*B0

LDW .D1LDW .D1T2T2 *A0,B5 *A0,B5|| STW .D2|| STW .D2T2T2 B6,*B0 B6,*B0 LDW .D1LDW .D1T2T2 *A0,B5 *A0,B5|| STW .D2|| STW .D2T2T2 B6,*B0 B6,*B0

DA2 = T2DA2 = T2

Page 33: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

33

ConditionsConditions Don’t Use Cross Don’t Use Cross PathsPaths

If aIf a conditional registerconditional register comes comes from the opposite sidefrom the opposite side, it does , it does NOT NOT use a data or address cross-pathuse a data or address cross-path..

Examples:Examples:

[[BB2]2] ADD .L ADD .L11 A2,A0,A4 A2,A0,A4 [[AA1]1] LDW .D LDW .D22 *B0,B5 *B0,B5

Page 34: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

34

‘‘C6x Data-PathC6x Data-Path - - SummarySummary

CPUCPURef GuideRef Guide

Full CPU DatapathFull CPU Datapath(Pg 2-2)(Pg 2-2)

‘‘C67x C67x

Page 35: Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

35

Cross Paths - Cross Paths - SummarySummary

DataData Destination register on same side as unit.Destination register on same side as unit. Source registers - up to one cross path per Source registers - up to one cross path per

execute packet per side.execute packet per side. Use “x” to indicate cross-path.Use “x” to indicate cross-path.

AddressAddress Pointer must be on same side as unit.Pointer must be on same side as unit. Data can be transferred to/from either side.Data can be transferred to/from either side. Parallel accesses: both cross or neither cross.Parallel accesses: both cross or neither cross.

ConditionalsConditionals Don’tDon’t Use Cross Paths. Use Cross Paths.