TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

42
TMS320 TMS320 C6000 C6000 m Dahnoun, Bristol University, (c) Texas Instruments 2004 m Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Over Architectural Over

Transcript of TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

Page 1: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

TMS320TMS320C6000C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Architectural OverviewArchitectural Overview

Page 2: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

2

Describe C6000C6000 CPU architectureCPU architecture

Introduce some basic instructionsbasic instructions

Describe the C6000 memory mapC6000 memory map

Provide an overview of the overview of the peripheralsperipherals

Learning ObjectivesLearning Objectives

Page 3: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

3

General DSP System Block General DSP System Block DiagramDiagram

Page 4: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

4

It has been shown in It has been shown in IntroductionIntroduction that that SOPSOP is is the key element for most the key element for most

DSP algorithms.DSP algorithms.

So let’s write the code So let’s write the code for this algorithm and at for this algorithm and at the same time discover the same time discover the C6000 architecture.the C6000 architecture.

Two basicTwo basic

operationsoperations are required are required

for this algorithm.for this algorithm.

(1) (1) MultiplicationMultiplication

(2) (2) Addition Addition

Therefore two basicTherefore two basic

instructionsinstructions are required are required

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Implementation of Sum of Implementation of Sum of Products (Products (SOPSOP))

The implementation in this The implementation in this module will be done inmodule will be done in

assemblyassembly

Page 5: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

5

MultiplyMultiply ( (MPYMPY))

The The multiplicationmultiplication of of aa11 by x by x1 1 is is done in assembly by the following done in assembly by the following instruction:instruction:

MPY MPY a1, x1, Ya1, x1, Y

This instruction is This instruction is performed by a multiplier performed by a multiplier

unit that is called unit that is called “.M”“.M”

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Page 6: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

6

MultiplyMultiply ( (.M unit.M unit))

.M.M.M.M

Y =Y =4040

aann x xnnn = 1n = 1

**

The The .M.M unit performs multiplications in unit performs multiplications in hardware hardware

MPYMPY .M.M a1, x1, Ya1, x1, Y

Note:Note: 16-bit by 16-bit multiplier provides a 32-bit result. 16-bit by 16-bit multiplier provides a 32-bit result.

32-bit by 32-bit multiplier provides a 64-bit result.32-bit by 32-bit multiplier provides a 64-bit result.

Page 7: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

7

AdditionAddition ((.L unit.L unit))

.M.M.M.M

.L.L.L.L

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so

lets change this code.lets change this code.

RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so

lets change this code.lets change this code.

Page 8: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

8

Register Register FileFile AA

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod

and and YY by the registers by the registers as shown above.as shown above.

Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod

and and YY by the registers by the registers as shown above.as shown above.

Page 9: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

9

Specifying Register Specifying Register NamesNames

Register Register File AFile A contains contains 1616 registers registers (A0 -A15) which are (A0 -A15) which are

3232-bits wide.-bits wide.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

Page 10: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

10

DataData loading (load loading (load unitunit .D .D))

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

A: The operands are A: The operands are loadedloaded into the registers by loading into the registers by loading them from the them from the memorymemory using the using the .D.D unit. unit.

.D.D.D.D

Data MemoryData MemoryData MemoryData Memory

It is worth noting at this It is worth noting at this stage that the only way stage that the only way

to access memory is to access memory is through the .D unit.through the .D unit.

It is worth noting at this It is worth noting at this stage that the only way stage that the only way

to access memory is to access memory is through the .D unit.through the .D unit.

Page 11: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

11

Load InstructionLoad Instruction

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D

Data MemoryData MemoryData MemoryData Memory

Page 12: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

12

Load InstructionsLoad Instructions

A: The A: The load instructionsload instructions..

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D

Data MemoryData MemoryData MemoryData Memory

LDLDBB, LD, LDHHLDLDWW,LD,LDDWDW

Page 13: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

13

Using the Load Using the Load InstructionsInstructions

Before using the load Before using the load unit you have to be unit you have to be

aware that this processor aware that this processor is is byte addressable, ,

which means that each which means that each byte is represented by a byte is represented by a

unique address.unique address.

Also the addresses are 32-bit wide.

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

Byte

Page 14: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

14

The syntax for the load The syntax for the load instruction is:instruction is:

Where:Where:

RnRn is a register that contains is a register that contains the address of the operand to the address of the operand to be loaded be loaded

andand

RmRm is the destination register. is the destination register.

Using the Load Using the Load InstructionsInstructions

LD *LD *RnRn,,RmRm

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

Page 15: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

15

The syntax for the load The syntax for the load instruction is:instruction is:

The question now is how The question now is how many bytes are going to be many bytes are going to be loaded into the destination loaded into the destination register?register?

Using the Load Using the Load InstructionsInstructions

LD *Rn,RmLD *Rn,Rm

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

Page 16: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

16

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Using the Load Using the Load InstructionsInstructions

The answer, is that it depends on The answer, is that it depends on the instruction you choose:the instruction you choose:• LDLDBB: loads one byte: loads one byte (8-bit)(8-bit)

• LDLDHH: loads half word: loads half word (16-bit)(16-bit)

• LDLDWW: loads a word: loads a word (32-bit)(32-bit)

• LDLDDWDW: loads a double word: loads a double word (64-bit)(64-bit)

Note: LD on its own does not Note: LD on its own does not existexist..

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

a1a1x1x1

prodprod

YY

Page 17: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

17

Using the Load Using the Load InstructionsInstructions

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1)(1) LDLDBB *A5, A7 ; *A5, A7 ;

gives A7 = 0x000000gives A7 = 0x0000000101

(2)(2) LDLDHH *A5,A7; *A5,A7;

gives A7 = 0x0000gives A7 = 0x000002010201

(3)(3) LDLDWW *A5,A7; *A5,A7;

gives A7 = 0xgives A7 = 0x0403020104030201

(3)(3) LDLDDWDW *A5,A7:A6; *A5,A7:A6;

gives A7:A6 = gives A7:A6 = 0x0x08070605040302010807060504030201

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

0x0x01010x0x0202

0x0x03030x0x0404

0x0x05050x0x0606

0x0x07070x0x0808

0011The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Little Endianla parte meno significativava memorizzata per prima,

all'indirizzo più bassodi memoria

Page 18: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

18

Using the Load Using the Load InstructionsInstructions

Q: If data can only be accessed by Q: If data can only be accessed by the load instruction and the .D the load instruction and the .D unit, how can we load the register unit, how can we load the register pointer Rn in the first place?pointer Rn in the first place?

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

0011The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Page 19: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

19

The instruction The instruction MVKLMVKL will allow a move of a will allow a move of a 16-16-bitbit constant into a register as shown below: constant into a register as shown below:

MVKL .? a, A5(‘a’ is a constant or label)

How many bits represent a full address?How many bits represent a full address?

32 bits32 bits

So why does the instruction not allow a 32-bit So why does the instruction not allow a 32-bit move?move?

All instructions are All instructions are 32-bit wide32-bit wide (see instruction (see instruction opcode).opcode).

Loading the Loading the PointerPointer **RnRn

Page 20: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

20

To solve this problem another instruction To solve this problem another instruction is available: is available: MVMVKHKH

Loading the Loading the Pointer Pointer **RnRn

eg. eg. MVMVKHKH .?.? aa, A5, A5

(‘a’ is a constant or label) (‘a’ is a constant or label)

aahh

aahh xx

aall aa

A5A5

Finally, to move the 32-bit address to a Finally, to move the 32-bit address to a register we can use:register we can use:

MVKL a, A5

MVKH a, A5

Page 21: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

21

Always use Always use MVMVKLKL then then MVMVKHKH, look at the , look at the following examples:following examples:

Loading the Loading the Pointer Pointer **RnRn

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC Wrong

Example 1 (A5 = 0x8765432187654321)

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)

MVKHMVKH 0x0x1234FABC1234FABC, A5 , A5

A5 = 0xA5 = 0x1234FABC1234FABC OK OK

Example 2 (A5 = 0x8765432187654321)

MVKH 0x1234FABC, A5

A5 = 0x12344321

Page 22: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

22

LDLDHH, , MVMVKLKL and and MVMVKHKH

MVKMVKLL pt1pt1, A5, A5 MVKMVKHH pt1pt1, A5 , A5

MVKMVKLL pt2pt2, A6, A6 MVKMVKHH pt2pt2, A6, A6

LDLDHH .D.D *A5, A0*A5, A0

LDLDHH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

pt1 and pt2 point to some locations

in the data memory.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A1A155

Register Register File AFile A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

..

..

..

.D.D.D.D

Data MemoryData MemoryData MemoryData Memory

Le componentiaiai e xixi sono numeri

interi da 16 bits16 bits

Page 23: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

23

Creating a loopCreating a loop

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

Page 24: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

24

Creating a loop –Creating a loop – B B instructioninstruction

With the C6000 processors With the C6000 processors there are there are no dedicated no dedicated

instructions such as block instructions such as block repeatrepeat. The loop is created . The loop is created

using the using the BB instructioninstruction..

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

Page 25: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

25

What are the steps for What are the steps for creating a creating a looploop ? ?

Create a Create a labellabel to branch to. to branch to.

Add a Add a branch instructionbranch instruction, B., B.

Create a Create a loop counter loop counter (LC).(LC).

Add an instruction toAdd an instruction to decrement the LC decrement the LC..

Make the Make the branch conditionalbranch conditional based on LCbased on LC..

Page 26: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

26

1. Create a1. Create a label label to branch to branch toto

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4 A4, A3, A4

Page 27: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

27

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .?.? looploop

2. Add a 2. Add a branch branch instructioninstruction, B., B.

Page 28: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

28

Which Which unitunit is used by the B is used by the B instruction?instruction?

Data MemoryData MemoryData MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S.S pt1, A5pt1, A5 MVKH MVKH .S.S pt1, A5 pt1, A5

MVKL MVKL .S.S pt2, A6pt2, A6 MVKH MVKH .S.S pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

Page 29: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

29

3. Create a 3. Create a loop loop countercounter..

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

B registersB registers will be introduced later will be introduced later

Data MemoryData MemoryData MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

Page 30: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

30

4. 4. Decrement the loop Decrement the loop countercounter

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, count, B0B0

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

BB .S.S looploop

Data MemoryData MemoryData MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

Page 31: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

31

What is the syntax for making instruction What is the syntax for making instruction conditional?conditional?

[[conditioncondition] ] InstructionInstructionLabelLabel

e.g.e.g. [[B1B1]] BB looploop

(1) (1) The condition can be one of the following The condition can be one of the following registers: registers: A1A1, , A2A2, , B0B0, , B1B1, , B2B2..

(2) (2) Any instruction can be conditional.Any instruction can be conditional.

5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop

countercounter

Page 32: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

32

The condition can be inverted by adding the The condition can be inverted by adding the exclamation symbol exclamation symbol ““!!”” as follows: as follows:

[[!!condition] condition] InstructionInstruction LabelLabel

e.g.e.g.

[[!!B0]B0] BB loop; branch if loop; branch if B0B0 = = 00

[B0][B0] BB loop; branch if loop; branch if B0B0 != != 00

5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop

countercounter

Page 33: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

33

MVKLMVKL .S2.S2 pt1, A5pt1, A5 MVKH MVKH .S2.S2 pt1, A5 pt1, A5

MVKL MVKL .S2.S2 pt2, A6pt2, A6 MVKH MVKH .S2.S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, count, B0B0

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0, 1, B0B0

[B0][B0] BB .S.S looploop

Data MemoryData MemoryData MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

5. Make the 5. Make the branch branch conditionalconditional

Page 34: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

34

With this processor all the instructions are With this processor all the instructions are encoded in a 32-bit.encoded in a 32-bit.

Therefore the Therefore the labellabel must have a dynamic range must have a dynamic range of less than 32-bit as the instruction B has to be of less than 32-bit as the instruction B has to be coded.coded.

Case 1: Case 1: BB .S1.S1 labellabel RelativeRelative branch. branch.

Label limited to Label limited to +/- 2+/- 22020 offset offset..

More on the Branch Instruction More on the Branch Instruction (1)(1)

21-bit21-bit relativerelative address addressBB

32-bit32-bit

Page 35: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

35

More on the Branch Instruction More on the Branch Instruction (2)(2)

By specifyingBy specifying a register as an operand a register as an operand instead instead of a label, it is possible to have an of a label, it is possible to have an absoluteabsolute branch.branch.

This will allow a dynamic range of This will allow a dynamic range of 223232..

Case 2: Case 2: B B .S2.S2 registerregister AbsoluteAbsolute branch. branch.

Operates on .S2 ONLY!Operates on .S2 ONLY!

5-bit register 5-bit register codecodeBB

32-bit32-bit

Page 36: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

36

Testing the codeTesting the code

This code performsThis code performs

the following operations:the following operations:

aa00*x*x00 + a + a00*x*x00 + a + a00*x*x00 + +

… … + a+ a00*x*x00

However, we would like to perform:However, we would like to perform:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

… … + a+ aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 37: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

37

Modifying the pointersModifying the pointers

The solution isThe solution is

to modify to modify

the pointersthe pointers

A5A5 and and A6A6

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5*A5, A0, A0

LDHLDH .D.D *A6*A6, A1, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 38: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

38

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

[disp] specifies # elements - size in DW, W, H, or B.[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant.disp = R or 5-bit constant. R can be any register.R can be any register.

Page 39: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

39

Modify and testing the Modify and testing the codecode

This code now performsThis code now performs

the following operations:the following operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

... + a... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5*A5++++, A0, A0

LDHLDH .D.D *A6*A6++++, A1, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 40: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

40

Store the final resultStore the final result

This code now performsThis code now performs

the following operations:the following operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +

... + a... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4, A3, A4A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 41: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

41

Store the final resultStore the final result

The Pointer The Pointer A7A7

is now initialised.is now initialised.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, A4, *A7*A7

Page 42: TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.

42

What is the initial value of What is the initial value of A4?A4?

A4A4 is used as is used as

an accumulator,an accumulator,

so it needs to beso it needs to be

reset to zero.reset to zero.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0ZEROZERO .L.L A4A4

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4A4, A3, , A3, A4A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7