TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
-
Upload
aliya-whitling -
Category
Documents
-
view
225 -
download
2
Transcript of TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
TMS320TMS320C6000C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Architectural OverviewArchitectural Overview
2
Describe C6000C6000 CPU architectureCPU architecture
Introduce some basic instructionsbasic instructions
Describe the C6000 memory mapC6000 memory map
Provide an overview of the overview of the peripheralsperipherals
Learning ObjectivesLearning Objectives
3
General DSP System Block General DSP System Block DiagramDiagram
4
It has been shown in It has been shown in IntroductionIntroduction that that SOPSOP is is the key element for most the key element for most
DSP algorithms.DSP algorithms.
So let’s write the code So let’s write the code for this algorithm and at for this algorithm and at the same time discover the same time discover the C6000 architecture.the C6000 architecture.
Two basicTwo basic
operationsoperations are required are required
for this algorithm.for this algorithm.
(1) (1) MultiplicationMultiplication
(2) (2) Addition Addition
Therefore two basicTherefore two basic
instructionsinstructions are required are required
Y =Y =NN
aann xxnnn = 1n = 1
**
= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN
Implementation of Sum of Implementation of Sum of Products (Products (SOPSOP))
The implementation in this The implementation in this module will be done inmodule will be done in
assemblyassembly
5
MultiplyMultiply ( (MPYMPY))
The The multiplicationmultiplication of of aa11 by x by x1 1 is is done in assembly by the following done in assembly by the following instruction:instruction:
MPY MPY a1, x1, Ya1, x1, Y
This instruction is This instruction is performed by a multiplier performed by a multiplier
unit that is called unit that is called “.M”“.M”
Y =Y =NN
aann xxnnn = 1n = 1
**
= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN
6
MultiplyMultiply ( (.M unit.M unit))
.M.M.M.M
Y =Y =4040
aann x xnnn = 1n = 1
**
The The .M.M unit performs multiplications in unit performs multiplications in hardware hardware
MPYMPY .M.M a1, x1, Ya1, x1, Y
Note:Note: 16-bit by 16-bit multiplier provides a 32-bit result. 16-bit by 16-bit multiplier provides a 32-bit result.
32-bit by 32-bit multiplier provides a 64-bit result.32-bit by 32-bit multiplier provides a 64-bit result.
7
AdditionAddition ((.L unit.L unit))
.M.M.M.M
.L.L.L.L
Y =Y =4040
aann x xnnn = 1n = 1
**
MPYMPY .M.M a1, x1, proda1, x1, prod
ADDADD .L.L Y, prod, YY, prod, Y
RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so
lets change this code.lets change this code.
RISC processors such as the C6000 RISC processors such as the C6000 use registers to hold the operands, so use registers to hold the operands, so
lets change this code.lets change this code.
8
Register Register FileFile AA
Y =Y =4040
aann x xnnn = 1n = 1
**
MPYMPY .M.M a1, x1, proda1, x1, prod
ADDADD .L.L Y, prod, YY, prod, Y
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod
and and YY by the registers by the registers as shown above.as shown above.
Let us correct this by Let us correct this by replacing replacing aa, , xx, , prodprod
and and YY by the registers by the registers as shown above.as shown above.
9
Specifying Register Specifying Register NamesNames
Register Register File AFile A contains contains 1616 registers registers (A0 -A15) which are (A0 -A15) which are
3232-bits wide.-bits wide.
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
10
DataData loading (load loading (load unitunit .D .D))
Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
A: The operands are A: The operands are loadedloaded into the registers by loading into the registers by loading them from the them from the memorymemory using the using the .D.D unit. unit.
.D.D.D.D
Data MemoryData MemoryData MemoryData Memory
It is worth noting at this It is worth noting at this stage that the only way stage that the only way
to access memory is to access memory is through the .D unit.through the .D unit.
It is worth noting at this It is worth noting at this stage that the only way stage that the only way
to access memory is to access memory is through the .D unit.through the .D unit.
11
Load InstructionLoad Instruction
Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
.D.D.D.D
Data MemoryData MemoryData MemoryData Memory
12
Load InstructionsLoad Instructions
A: The A: The load instructionsload instructions..
Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading used for loading operandsoperands from the memory to the from the memory to the registers?registers?.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
.D.D.D.D
Data MemoryData MemoryData MemoryData Memory
LDLDBB, LD, LDHHLDLDWW,LD,LDDWDW
13
Using the Load Using the Load InstructionsInstructions
Before using the load Before using the load unit you have to be unit you have to be
aware that this processor aware that this processor is is byte addressable, ,
which means that each which means that each byte is represented by a byte is represented by a
unique address.unique address.
Also the addresses are 32-bit wide.
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
Byte
14
The syntax for the load The syntax for the load instruction is:instruction is:
Where:Where:
RnRn is a register that contains is a register that contains the address of the operand to the address of the operand to be loaded be loaded
andand
RmRm is the destination register. is the destination register.
Using the Load Using the Load InstructionsInstructions
LD *LD *RnRn,,RmRm
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
a1a1x1x1
prodprod
YY
15
The syntax for the load The syntax for the load instruction is:instruction is:
The question now is how The question now is how many bytes are going to be many bytes are going to be loaded into the destination loaded into the destination register?register?
Using the Load Using the Load InstructionsInstructions
LD *Rn,RmLD *Rn,Rm
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
a1a1x1x1
prodprod
YY
16
The syntax for the load The syntax for the load instruction is:instruction is:
LD *Rn,RmLD *Rn,Rm
Using the Load Using the Load InstructionsInstructions
The answer, is that it depends on The answer, is that it depends on the instruction you choose:the instruction you choose:• LDLDBB: loads one byte: loads one byte (8-bit)(8-bit)
• LDLDHH: loads half word: loads half word (16-bit)(16-bit)
• LDLDWW: loads a word: loads a word (32-bit)(32-bit)
• LDLDDWDW: loads a double word: loads a double word (64-bit)(64-bit)
Note: LD on its own does not Note: LD on its own does not existexist..
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
a1a1x1x1
prodprod
YY
17
Using the Load Using the Load InstructionsInstructions
Example:Example:
If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:
(1)(1) LDLDBB *A5, A7 ; *A5, A7 ;
gives A7 = 0x000000gives A7 = 0x0000000101
(2)(2) LDLDHH *A5,A7; *A5,A7;
gives A7 = 0x0000gives A7 = 0x000002010201
(3)(3) LDLDWW *A5,A7; *A5,A7;
gives A7 = 0xgives A7 = 0x0403020104030201
(3)(3) LDLDDWDW *A5,A7:A6; *A5,A7:A6;
gives A7:A6 = gives A7:A6 = 0x0x08070605040302010807060504030201
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
0xB0xB0xA0xA
0xD0xD0xC0xC
0x0x01010x0x0202
0x0x03030x0x0404
0x0x05050x0x0606
0x0x07070x0x0808
0011The syntax for the load The syntax for the load instruction is:instruction is:
LD *Rn,RmLD *Rn,Rm
Little Endianla parte meno significativava memorizzata per prima,
all'indirizzo più bassodi memoria
18
Using the Load Using the Load InstructionsInstructions
Q: If data can only be accessed by Q: If data can only be accessed by the load instruction and the .D the load instruction and the .D unit, how can we load the register unit, how can we load the register pointer Rn in the first place?pointer Rn in the first place?
0000000000000000
0000000200000002
0000000400000004
0000000600000006
0000000800000008
DataData
16-bits16-bits
addressaddress
FFFFFFFFFFFFFFFF
0xB0xB0xA0xA
0xD0xD0xC0xC
0x10x10x20x2
0x30x30x40x4
0x50x50x60x6
0x70x70x80x8
0011The syntax for the load The syntax for the load instruction is:instruction is:
LD *Rn,RmLD *Rn,Rm
19
The instruction The instruction MVKLMVKL will allow a move of a will allow a move of a 16-16-bitbit constant into a register as shown below: constant into a register as shown below:
MVKL .? a, A5(‘a’ is a constant or label)
How many bits represent a full address?How many bits represent a full address?
32 bits32 bits
So why does the instruction not allow a 32-bit So why does the instruction not allow a 32-bit move?move?
All instructions are All instructions are 32-bit wide32-bit wide (see instruction (see instruction opcode).opcode).
Loading the Loading the PointerPointer **RnRn
20
To solve this problem another instruction To solve this problem another instruction is available: is available: MVMVKHKH
Loading the Loading the Pointer Pointer **RnRn
eg. eg. MVMVKHKH .?.? aa, A5, A5
(‘a’ is a constant or label) (‘a’ is a constant or label)
aahh
aahh xx
aall aa
A5A5
Finally, to move the 32-bit address to a Finally, to move the 32-bit address to a register we can use:register we can use:
MVKL a, A5
MVKH a, A5
21
Always use Always use MVMVKLKL then then MVMVKHKH, look at the , look at the following examples:following examples:
Loading the Loading the Pointer Pointer **RnRn
MVKL 0x1234FABC, A5
A5 = 0xFFFFFABC Wrong
Example 1 (A5 = 0x8765432187654321)
MVKL 0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
MVKHMVKH 0x0x1234FABC1234FABC, A5 , A5
A5 = 0xA5 = 0x1234FABC1234FABC OK OK
Example 2 (A5 = 0x8765432187654321)
MVKH 0x1234FABC, A5
A5 = 0x12344321
22
LDLDHH, , MVMVKLKL and and MVMVKHKH
MVKMVKLL pt1pt1, A5, A5 MVKMVKHH pt1pt1, A5 , A5
MVKMVKLL pt2pt2, A6, A6 MVKMVKHH pt2pt2, A6, A6
LDLDHH .D.D *A5, A0*A5, A0
LDLDHH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
pt1 and pt2 point to some locations
in the data memory.
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A4A4
A1A155
Register Register File AFile A
..
..
..
a1a1x1x1
prodprod
32-bits32-bits
YY
..
..
..
.D.D.D.D
Data MemoryData MemoryData MemoryData Memory
Le componentiaiai e xixi sono numeri
interi da 16 bits16 bits
23
Creating a loopCreating a loop
MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5
MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6
LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.
Y= aY= a11 ** x x11
So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.
24
Creating a loop –Creating a loop – B B instructioninstruction
With the C6000 processors With the C6000 processors there are there are no dedicated no dedicated
instructions such as block instructions such as block repeatrepeat. The loop is created . The loop is created
using the using the BB instructioninstruction..
So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.
Y= aY= a11 ** x x11
So let’s create a So let’s create a looploop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.
25
What are the steps for What are the steps for creating a creating a looploop ? ?
Create a Create a labellabel to branch to. to branch to.
Add a Add a branch instructionbranch instruction, B., B.
Create a Create a loop counter loop counter (LC).(LC).
Add an instruction toAdd an instruction to decrement the LC decrement the LC..
Make the Make the branch conditionalbranch conditional based on LCbased on LC..
26
1. Create a1. Create a label label to branch to branch toto
MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5
MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6
looploop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4 A4, A3, A4
27
MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5
MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
BB .?.? looploop
2. Add a 2. Add a branch branch instructioninstruction, B., B.
28
Which Which unitunit is used by the B is used by the B instruction?instruction?
Data MemoryData MemoryData MemoryData Memory
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.S.S.S.S
MVKLMVKL .S.S pt1, A5pt1, A5 MVKH MVKH .S.S pt1, A5 pt1, A5
MVKL MVKL .S.S pt2, A6pt2, A6 MVKH MVKH .S.S pt2, A6pt2, A6
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
BB .S.S looploop
29
3. Create a 3. Create a loop loop countercounter..
MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5
MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6
MVKLMVKL .S.S count, B0count, B0
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
BB .S.S looploop
B registersB registers will be introduced later will be introduced later
Data MemoryData MemoryData MemoryData Memory
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.S.S.S.S
30
4. 4. Decrement the loop Decrement the loop countercounter
MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5
MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6
MVKLMVKL .S.S count, count, B0B0
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
BB .S.S looploop
Data MemoryData MemoryData MemoryData Memory
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.S.S.S.S
31
What is the syntax for making instruction What is the syntax for making instruction conditional?conditional?
[[conditioncondition] ] InstructionInstructionLabelLabel
e.g.e.g. [[B1B1]] BB looploop
(1) (1) The condition can be one of the following The condition can be one of the following registers: registers: A1A1, , A2A2, , B0B0, , B1B1, , B2B2..
(2) (2) Any instruction can be conditional.Any instruction can be conditional.
5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop
countercounter
32
The condition can be inverted by adding the The condition can be inverted by adding the exclamation symbol exclamation symbol ““!!”” as follows: as follows:
[[!!condition] condition] InstructionInstruction LabelLabel
e.g.e.g.
[[!!B0]B0] BB loop; branch if loop; branch if B0B0 = = 00
[B0][B0] BB loop; branch if loop; branch if B0B0 != != 00
5. Make the branch conditional 5. Make the branch conditional based on the value in the loop based on the value in the loop
countercounter
33
MVKLMVKL .S2.S2 pt1, A5pt1, A5 MVKH MVKH .S2.S2 pt1, A5 pt1, A5
MVKL MVKL .S2.S2 pt2, A6pt2, A6 MVKH MVKH .S2.S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 count, count, B0B0
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0, 1, B0B0
[B0][B0] BB .S.S looploop
Data MemoryData MemoryData MemoryData Memory
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.M.M.M.M
.L.L.L.L
A0A0
A1A1
A2A2
A3A3
A15A15
Register File ARegister File A
..
..
..
aaxx
prodprod
32-bits32-bits
YY
.D.D.D.D
.S.S.S.S
5. Make the 5. Make the branch branch conditionalconditional
34
With this processor all the instructions are With this processor all the instructions are encoded in a 32-bit.encoded in a 32-bit.
Therefore the Therefore the labellabel must have a dynamic range must have a dynamic range of less than 32-bit as the instruction B has to be of less than 32-bit as the instruction B has to be coded.coded.
Case 1: Case 1: BB .S1.S1 labellabel RelativeRelative branch. branch.
Label limited to Label limited to +/- 2+/- 22020 offset offset..
More on the Branch Instruction More on the Branch Instruction (1)(1)
21-bit21-bit relativerelative address addressBB
32-bit32-bit
35
More on the Branch Instruction More on the Branch Instruction (2)(2)
By specifyingBy specifying a register as an operand a register as an operand instead instead of a label, it is possible to have an of a label, it is possible to have an absoluteabsolute branch.branch.
This will allow a dynamic range of This will allow a dynamic range of 223232..
Case 2: Case 2: B B .S2.S2 registerregister AbsoluteAbsolute branch. branch.
Operates on .S2 ONLY!Operates on .S2 ONLY!
5-bit register 5-bit register codecodeBB
32-bit32-bit
36
Testing the codeTesting the code
This code performsThis code performs
the following operations:the following operations:
aa00*x*x00 + a + a00*x*x00 + a + a00*x*x00 + +
… … + a+ a00*x*x00
However, we would like to perform:However, we would like to perform:
aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +
… … + a+ aNN*x*xNN
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 count, B0count, B0
loop loop LDHLDH .D.D *A5, A0*A5, A0
LDHLDH .D.D *A6, A1*A6, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
37
Modifying the pointersModifying the pointers
The solution isThe solution is
to modify to modify
the pointersthe pointers
A5A5 and and A6A6
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 count, B0count, B0
loop loop LDHLDH .D.D *A5*A5, A0, A0
LDHLDH .D.D *A6*A6, A1, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
38
Indexing PointersIndexing Pointers
DescriptionDescription
PointerPointer
+ Pre-offset+ Pre-offset
- Pre-offset- Pre-offset
Pre-incrementPre-increment
Pre-decrementPre-decrement
Post-incrementPost-increment
Post-decrementPost-decrement
SyntaxSyntax PointerPointerModifiedModified
*R*R
*+R[*+R[dispdisp]]
*-R[*-R[dispdisp]]
*++R[*++R[dispdisp]]
*--R[*--R[dispdisp]]
*R++[*R++[dispdisp]]
*R--[*R--[dispdisp]]
NoNo
NoNo
NoNo
YesYes
YesYes
YesYes
YesYes
[disp] specifies # elements - size in DW, W, H, or B.[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant.disp = R or 5-bit constant. R can be any register.R can be any register.
39
Modify and testing the Modify and testing the codecode
This code now performsThis code now performs
the following operations:the following operations:
aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +
... + a... + aNN*x*xNN
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 count, B0count, B0
loop loop LDHLDH .D.D *A5*A5++++, A0, A0
LDHLDH .D.D *A6*A6++++, A1, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
40
Store the final resultStore the final result
This code now performsThis code now performs
the following operations:the following operations:
aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + +
... + a... + aNN*x*xNN
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 count, B0count, B0
loop loop LDHLDH .D.D *A5++, A0*A5++, A0
LDHLDH .D.D *A6++, A1*A6++, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4, A3, A4A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
STHSTH .D.D A4, *A7A4, *A7
41
Store the final resultStore the final result
The Pointer The Pointer A7A7
is now initialised.is now initialised.
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0
loop loop LDHLDH .D.D *A5++, A0*A5++, A0
LDHLDH .D.D *A6++, A1*A6++, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4, A3, A4A4, A3, A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
STHSTH .D.D A4, A4, *A7*A7
42
What is the initial value of What is the initial value of A4?A4?
A4A4 is used as is used as
an accumulator,an accumulator,
so it needs to beso it needs to be
reset to zero.reset to zero.
MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5
MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6
MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0ZEROZERO .L.L A4A4
loop loop LDHLDH .D.D *A5++, A0*A5++, A0
LDHLDH .D.D *A6++, A1*A6++, A1
MPYMPY .M.M A0, A1, A3A0, A1, A3
ADDADD .L.L A4A4, A3, , A3, A4A4
SUBSUB .S.S B0, 1, B0B0, 1, B0
[B0][B0] BB .S.S looploop
STHSTH .D.D A4, *A7A4, *A7