TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

TMS320TMS320C6000C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Architectural OverviewArchitectural Overview

2

Describe C6000C6000 CPU architectureCPU architecture

Introduce some basic instructionsbasic instructions

Describe the C6000 memory mapC6000 memory map

Provide an overview of the overview of the peripheralsperipherals

Learning ObjectivesLearning Objectives

3

General DSP System Block General DSP System Block DiagramDiagram

4

It has been shown in It has been shown in IntroductionIntroduction that that SOPSOP is is the key element for most the key element for most

DSP algorithms.DSP algorithms.

So let’s write the code So let’s write the code for this algorithm and at for this algorithm and at the same time discover the same time discover the C6000 architecture.the C6000 architecture.

Two basicTwo basic

operationsoperations are required are required

for this algorithm.for this algorithm.

(1) (1) MultiplicationMultiplication

(2) (2) Addition Addition

Therefore two basicTherefore two basic

instructionsinstructions are required are required

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Implementation of Sum of Implementation of Sum of Products (Products (SOPSOP))

The implementation in this The implementation in this module will be done inmodule will be done in

assemblyassembly

5

MultiplyMultiply ( (MPYMPY))

The The multiplicationmultiplication of of aa11 by x by x1 1 is is done in assembly by the following done in assembly by the following instruction:instruction:

MPY MPY a1, x1, Ya1, x1, Y

This instruction is This instruction is performed by a multiplier performed by a multiplier

unit that is called unit that is called “.M”“.M”

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

6

MultiplyMultiply ( (.M unit.M unit))

.M.M.M.M

Y =Y =4040

aann x xnnn = 1n = 1

**

The The .M.M unit performs multiplications in unit performs multiplications in hardware hardware

MPYMPY .M.M a1, x1, Ya1, x1, Y

Note:Note: 16-bit by 16-bit multiplier provides a 32-bit result. 16-bit by 16-bit multiplier provides a 32-bit result.

32-bit by 32-bit multiplier provides a 64-bit result.32-bit by 32-bit multiplier provides a 64-bit result.

7

AdditionAddition ((.L unit.L unit))

.M.M.M.M

.L.L.L.L

Y =Y =4040


**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so

lets change this code.lets change this code.

RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so

lets change this code.lets change this code.

8

Register Register FileFile AA

Y =Y =4040


**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4



.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod

and and YY by the registers by the registers as shown above.as shown above.

Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod

and and YY by the registers by the registers as shown above.as shown above.

9

Specifying Register Specifying Register NamesNames

Register Register File AFile A contains contains 1616 registers registers (A0 -A15) which are (A0 -A15) which are

3232-bits wide.-bits wide.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155


..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

10

DataData loading (load loading (load unitunit .D .D))

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155


..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

A: The operands are A: The operands are loadedloaded into the registers by loading into the registers by loading them from the them from the memorymemory using the using the .D.D unit. unit.

.D.D.D.D

Data MemoryData MemoryData MemoryData Memory

It is worth noting at this It is worth noting at this stage that the only way stage that the only way

to access memory is to access memory is through the .D unit.through the .D unit.

It is worth noting at this It is worth noting at this stage that the only way stage that the only way

to access memory is to access memory is through the .D unit.through the .D unit.

11

Load InstructionLoad Instruction

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155


..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D


12

Load InstructionsLoad Instructions

A: The A: The load instructionsload instructions..

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155


..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D


LDLDBB, LD, LDHHLDLDWW,LD,LDDWDW

13

Using the Load Using the Load InstructionsInstructions

Before using the load Before using the load unit you have to be unit you have to be

aware that this processor aware that this processor is is byte addressable, ,

which means that each which means that each byte is represented by a byte is represented by a

unique address.unique address.

Also the addresses are 32-bit wide.

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

Byte

14

The syntax for the load The syntax for the load instruction is:instruction is:

Where:Where:

RnRn is a register that contains is a register that contains the address of the operand to the address of the operand to be loaded be loaded

andand

RmRm is the destination register. is the destination register.


LD *LD *RnRn,,RmRm

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

15


The question now is how The question now is how many bytes are going to be many bytes are going to be loaded into the destination loaded into the destination register?register?


LD *Rn,RmLD *Rn,Rm

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

16


LD *Rn,RmLD *Rn,Rm


The answer, is that it depends on The answer, is that it depends on the instruction you choose:the instruction you choose:• LDLDBB: loads one byte: loads one byte (8-bit)(8-bit)

• LDLDHH: loads half word: loads half word (16-bit)(16-bit)

• LDLDWW: loads a word: loads a word (32-bit)(32-bit)

• LDLDDWDW: loads a double word: loads a double word (64-bit)(64-bit)

Note: LD on its own does not Note: LD on its own does not existexist..

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

17


Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1)(1) LDLDBB *A5, A7 ; *A5, A7 ;

gives A7 = 0x000000gives A7 = 0x0000000101

(2)(2) LDLDHH *A5,A7; *A5,A7;

gives A7 = 0x0000gives A7 = 0x000002010201

(3)(3) LDLDWW *A5,A7; *A5,A7;

gives A7 = 0xgives A7 = 0x0403020104030201

(3)(3) LDLDDWDW *A5,A7:A6; *A5,A7:A6;

gives A7:A6 = gives A7:A6 = 0x0x08070605040302010807060504030201

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

0x0x01010x0x0202

0x0x03030x0x0404

0x0x05050x0x0606

0x0x07070x0x0808

0011The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Little Endianla parte meno significativava memorizzata per prima,

all'indirizzo più bassodi memoria

18


Q: If data can only be accessed by Q: If data can only be accessed by the load instruction and the .D the load instruction and the .D unit, how can we load the register unit, how can we load the register pointer Rn in the first place?pointer Rn in the first place?

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

0011The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

19

The instruction The instruction MVKLMVKL will allow a move of a will allow a move of a 16-16-bitbit constant into a register as shown below: constant into a register as shown below:

MVKL .? a, A5(‘a’ is a constant or label)

How many bits represent a full address?How many bits represent a full address?

32 bits32 bits

So why does the instruction not allow a 32-bit So why does the instruction not allow a 32-bit move?move?

All instructions are All instructions are 32-bit wide32-bit wide (see instruction (see instruction opcode).opcode).

Loading the Loading the PointerPointer **RnRn

20

To solve this problem another instruction To solve this problem another instruction is available: is available: MVMVKHKH

Loading the Loading the Pointer Pointer **RnRn

eg. eg. MVMVKHKH .?.? aa, A5, A5

(‘a’ is a constant or label) (‘a’ is a constant or label)

aahh

aahh xx

aall aa

A5A5

Finally, to move the 32-bit address to a Finally, to move the 32-bit address to a register we can use:register we can use:

MVKL a, A5

MVKH a, A5

21

Always use Always use MVMVKLKL then then MVMVKHKH, look at the , look at the following examples:following examples:

Loading the Loading the Pointer Pointer **RnRn

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC Wrong

Example 1 (A5 = 0x8765432187654321)

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)

MVKHMVKH 0x0x1234FABC1234FABC, A5 , A5

A5 = 0xA5 = 0x1234FABC1234FABC OK OK

Example 2 (A5 = 0x8765432187654321)

MVKH 0x1234FABC, A5

A5 = 0x12344321

22

LDLDHH, , MVMVKLKL and and MVMVKHKH

MVKMVKLL pt1pt1, A5, A5 MVKMVKHH pt1pt1, A5 , A5

MVKMVKLL pt2pt2, A6, A6 MVKMVKHH pt2pt2, A6, A6

LDLDHH .D.D *A5, A0*A5, A0

LDLDHH .D.D *A6, A1*A6, A1



pt1 and pt2 point to some locations

in the data memory.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155


..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D


Le componentiaiai e xixi sono numeri

interi da 16 bits16 bits

23

Creating a loopCreating a loop

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0




So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

24

Creating a loop –Creating a loop – B B instructioninstruction

With the C6000 processors With the C6000 processors there are there are no dedicated no dedicated

instructions such as block instructions such as block repeatrepeat. The loop is created . The loop is created

using the using the BB instructioninstruction..

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

25

What are the steps for What are the steps for creating a creating a looploop ? ?

Create a Create a labellabel to branch to. to branch to.

Add a Add a branch instructionbranch instruction, B., B.

Create a Create a loop counter loop counter (LC).(LC).

Add an instruction toAdd an instruction to decrement the LC decrement the LC..

Make the Make the branch conditionalbranch conditional based on LCbased on LC..

26

1. Create a1. Create a label label to branch to branch toto

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5


looploop LDHLDH .D.D *A5, A0*A5, A0



ADDADD .L.L A4, A3, A4 A4, A3, A4

27

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5


loop loop LDHLDH .D.D *A5, A0*A5, A0




BB .?.? looploop

2. Add a 2. Add a branch branch instructioninstruction, B., B.

28

Which Which unitunit is used by the B is used by the B instruction?instruction?


.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S.S pt1, A5pt1, A5 MVKH MVKH .S.S pt1, A5 pt1, A5

MVKL MVKL .S.S pt2, A6pt2, A6 MVKH MVKH .S.S pt2, A6pt2, A6





BB .S.S looploop

29

3. Create a 3. Create a loop loop countercounter..

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0





BB .S.S looploop

B registersB registers will be introduced later will be introduced later


.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

30

4. 4. Decrement the loop Decrement the loop countercounter

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, count, B0B0





SUBSUB .S.S B0, 1, B0B0, 1, B0

BB .S.S looploop


.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

31

What is the syntax for making instruction What is the syntax for making instruction conditional?conditional?

[[conditioncondition] ] InstructionInstructionLabelLabel

e.g.e.g. [[B1B1]] BB looploop

(1) (1) The condition can be one of the following The condition can be one of the following registers: registers: A1A1, , A2A2, , B0B0, , B1B1, , B2B2..

(2) (2) Any instruction can be conditional.Any instruction can be conditional.

5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop

countercounter

32

The condition can be inverted by adding the The condition can be inverted by adding the exclamation symbol exclamation symbol ““!!”” as follows: as follows:

[[!!condition] condition] InstructionInstruction LabelLabel

e.g.e.g.

[[!!B0]B0] BB loop; branch if loop; branch if B0B0 = = 00

[B0][B0] BB loop; branch if loop; branch if B0B0 != != 00

5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop

countercounter

33

MVKLMVKL .S2.S2 pt1, A5pt1, A5 MVKH MVKH .S2.S2 pt1, A5 pt1, A5

MVKL MVKL .S2.S2 pt2, A6pt2, A6 MVKH MVKH .S2.S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, count, B0B0





SUBSUB .S.S B0, 1, B0, 1, B0B0

[B0][B0] BB .S.S looploop


.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15


..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

5. Make the 5. Make the branch branch conditionalconditional

34

With this processor all the instructions are With this processor all the instructions are encoded in a 32-bit.encoded in a 32-bit.

Therefore the Therefore the labellabel must have a dynamic range must have a dynamic range of less than 32-bit as the instruction B has to be of less than 32-bit as the instruction B has to be coded.coded.

Case 1: Case 1: BB .S1.S1 labellabel RelativeRelative branch. branch.

Label limited to Label limited to +/- 2+/- 22020 offset offset..

More on the Branch Instruction More on the Branch Instruction (1)(1)

21-bit21-bit relativerelative address addressBB

32-bit32-bit

35

More on the Branch Instruction More on the Branch Instruction (2)(2)

By specifyingBy specifying a register as an operand a register as an operand instead instead of a label, it is possible to have an of a label, it is possible to have an absoluteabsolute branch.branch.

This will allow a dynamic range of This will allow a dynamic range of 223232..

Case 2: Case 2: B B .S2.S2 registerregister AbsoluteAbsolute branch. branch.

Operates on .S2 ONLY!Operates on .S2 ONLY!

5-bit register 5-bit register codecodeBB

32-bit32-bit

36

Testing the codeTesting the code

This code performsThis code performs

the following operations:the following operations:

aa00*x*x00 + a + a00*x*x00 + a + a00*x*x00 + +

… … + a+ a00*x*x00

However, we would like to perform:However, we would like to perform:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

… … + a+ aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0







37

Modifying the pointersModifying the pointers

The solution isThe solution is

to modify to modify

the pointersthe pointers

A5A5 and and A6A6




loop loop LDHLDH .D.D *A5*A5, A0, A0

LDHLDH .D.D *A6*A6, A1, A1





38

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

[disp] specifies # elements - size in DW, W, H, or B.[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant.disp = R or 5-bit constant. R can be any register.R can be any register.

39

Modify and testing the Modify and testing the codecode

This code now performsThis code now performs


aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

... + a... + aNN*x*xNN




loop loop LDHLDH .D.D *A5*A5++++, A0, A0

LDHLDH .D.D *A6*A6++++, A1, A1





40

Store the final resultStore the final result

This code now performsThis code now performs


aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

... + a... + aNN*x*xNN




loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1


ADDADD .L.L A4, A3, A4, A3, A4A4



STHSTH .D.D A4, *A7A4, *A7

41

Store the final resultStore the final result

The Pointer The Pointer A7A7

is now initialised.is now initialised.



MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0







STHSTH .D.D A4, A4, *A7*A7

42

What is the initial value of What is the initial value of A4?A4?

A4A4 is used as is used as

an accumulator,an accumulator,

so it needs to beso it needs to be

reset to zero.reset to zero.



MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0ZEROZERO .L.L A4A4




ADDADD .L.L A4A4, A3, , A3, A4A4



STHSTH .D.D A4, *A7A4, *A7

TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

Documents

Transcript of TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.