expo.nikkeibp.co.jp · bpo (eap) 'bpo (btm) new new new new new new new new
New Armlect1
Transcript of New Armlect1
-
8/3/2019 New Armlect1
1/80
1 1
ARM
The ARM Instruction SetARMAdvanced RISC Machines
-
8/3/2019 New Armlect1
2/80
2 2
ARM
P rocessor Modes
The ARM has six operating modes: User User (unprivileged mode under which most tasks run) (10000) FIQFIQ (entered when a high priority (fast) interrupt is raised)
(10001) IRQIRQ (entered when a low priority (normal) interrupt is raised)
(10010) Supervisor Supervisor (entered on reset and when a Software Interrupt
instruction is executed) (10011) Abort Abort (used to handle memory access violations) (10111) Undef Undef (used to handle undefined instructions) (11011)
ARM Architecture Version 4 adds a seventh mode: SystemSystem (privileged mode using the same registers as user
mode) (11111)
-
8/3/2019 New Armlect1
3/80
3 3
ARM
ARM has 37 registers in total, all of which are 32-bitslong.
1 dedicated program counter 1 dedicated current program status register
5 dedicated saved program status registers 30 general purpose registers
H owever these are arranged into several banks, with theaccessible bank being governed by the processor mode.Each mode can access
a particular set of r0-r12 registers a particular r13 (the stack pointer) and r14 (link register) r15 (the program counter) cpsr (the current program status register)
and privileged modes can also access a particular spsr (saved program status register)
The Registers
-
8/3/2019 New Armlect1
4/80
-
8/3/2019 New Armlect1
5/80
5 5
ARM
Accessing Registers usingARM Instructions
N o breakdown of currently accessible registers. All instructions can access r0-r14 directly. Most instructions also allow use of the PC.
Specific instructions to allow access to CPSRand SPSR.
-
8/3/2019 New Armlect1
6/80
6 6
ARM
The P rogram Status Registers(C P SR and S P SRs)
Co pies o f the ALU s tatus f lags (l atched if theinstructio n has the "S" bit set).
N = N egati ve result fr om ALU f lag.Z = Z er o result fr om ALU f lag.C = ALU o perati o n C arried o utV = ALU o perati o n o Verf low ed
* Interrupt Disa b le b its.I = 1, disables the IRQ.F = 1, disables the FIQ.
* T Bit ( A rchitecture v4T only)T = 0, P r o cesso r in ARM s tateT = 1, P r o cesso r in Thum b state
* Condition Code Flags
M odeN Z C V
2831 8 4 0
I F T
* M ode BitsM [4:0] define the pr o cesso r mo de.
-
8/3/2019 New Armlect1
7/80
7 7
ARM
L ogical Instruction A rithmetic Instruction
Flag
Negati ve No m eaning Bit 31 o f the result has been set(N=1) Indicates a negati ve num ber in
signed o perati o ns
Zer o R esult is all z er o es R esult o f o perati o n w aszer o(Z=1)
C arry After Shift o perati o n R esult w as greater than 32 bits(C =1) 1w as left in carry f lag
o Verf low No m eaning R esult w as greater than 31 bits(V=1) Indicates a po ss ible
co rru ptio n o f the sign bit in signed
Condition Flags
-
8/3/2019 New Armlect1
8/80
8 8
ARM
W hen the processor is executing in ARM state: All instructions are 32 bits in length All instructions must be word aligned Therefore the PC value is stored in bits [31:2] with bits [1:0]
equal to zero (as instruction cannot be halfword or bytealigned).
R14 is used as the subroutine link register (LR) andstores the return address when Branch with Linkoperations are performed,calculated from the PC.Thus to return from a linked branch
MOV r15,r14
or
The P rogram Counter (R15)
-
8/3/2019 New Armlect1
9/80
9 9
ARM
Register Example:User to FIQ Mode
spsr_fiq
cpsr
r7
r4r 5
r2r 1
r 0
r3
r6
r 15 (pc)r 14_fiq
r 13_fiq
r 12_fiq
r 10 _fiq
r 11 _fiq
r9_fiq
r8_fiq
r 14 (lr )
r 13 (sp)
r 12
r 10
r 11
r9
r8
U ser mode CPS R copied to FIQ mode SPS R
cpsr
r 15 (pc)r 14 (lr )
r 13 (sp)
r 12
r 10
r 11
r9
r8
r7
r4r 5
r2r 1
r 0
r3
r6
r 14_fiq
r 13_fiq
r 12_fiq
r 10 _fiq
r 11 _fiq
r9_fiq
r8_fiq
R eturn address calculated from U ser modePC value and stored in FIQ mode LR
R egisters in use R egisters in use
EXCEPTIO N
U ser M ode FIQ M ode
spsr_fiq
D isable FIQ
-
8/3/2019 New Armlect1
10/80
10 10
ARM
Exceptions
E xceptions generated as the direct effect of executing
an instruction Software Interrupts, Undefined Instructions & Prefetch AbortsE xceptions generated as a side effect of an Instruction Data Aborts (Caused by Load/Store Instructions)
E xceptions generated externally, unrelated toInstruction flow Reset, IRQ, FIQ
-
8/3/2019 New Armlect1
11/80
11 11
ARM
Exception Entry
Changes Operating Mode
Save Address of next Instruction in r14 of thenew modeSaves Old value of CPSR into SPSR of newmode
Disables either IRQ or FIQ if the exception isIRQ or FIQ respectivelyForces PC to vector to new address
-
8/3/2019 New Armlect1
12/80
12 12
ARM
Exception Return
Any modified user registers should be restored
from the StackThe CPSR should be restored from appropriateSPSRThe PC must be changed to relevant User Instruction Stream
P roblem: Last two cannot be carried outindependently.
-
8/3/2019 New Armlect1
13/80
13 13
ARM
Solution
To return from S W I MOVS pc, r14To return from IRQ, FIQ or Prefetch Abort SUBS pc, r14, #4
To return from Data Aborts SUBS pc, r14, #8
The S modifier signifies special form of Instruction when the destination is P C
This way only if S P SR, r14 stored onto the Stack:
LD MFD r13!, {r0-r3,pc} ^
-
8/3/2019 New Armlect1
14/80
14 14
ARM
P riority
Reset
Data AbortFIQIRQPrefetch Abort
S W I, Undefined Instruction
There should be some doubt
-
8/3/2019 New Armlect1
15/80
15 15
ARM
W hen an exception occurs, thecore: Copies CPSR into SPSR_ Sets appropriate CPSR bits
If core implements ARM Architecture 4Tand is currently in Thumb state, then
ARM state is entered.Mode field bits
Interrupt disable flags if appropriate. Maps in appropriate banked registers Stores the return address in
LR_ Sets PC to vector address
To return, exception handler
Exception H andlingand the Vector Table
0x00000000
0x0000001 C
0x0000001 8
0x0000001 4
0x00000010
0x0000000 C
0x0000000 8
0x0000000 4
R eset
U ndefined Instruction
FIQ
IR Q
R eserved
Data Ab ort
Prefetch Ab ort
Software Interrupt
-
8/3/2019 New Armlect1
16/80
16 16
ARM
The Instruction P ipeline
The ARM uses a pipeline in order to increase thespeed of the flow of instructions to the processor.
Allows several operations to be undertaken simultaneously,rather than serially.FE TCH
DE COD E
EXE CUT E
Instruction fetched from memory
Decoding of registers used in instruction
Register(s) read from Register BankShift and ALU operationW rite register(s) back to Register Bank
PC
PC - 4
PC - 8
ARM
-
8/3/2019 New Armlect1
17/80
17 17
ARM
Quiz #1 - Verbal
W hat registers are used to store the program counter
and link register?W hat is r13 often used to store?W hich mode, or modes has the fewest availablenumber of registers available? How many and why?N
ame the exceptions in the ARM?Mention their priorities.W hat happens on E xception E ntry & E xception E xit?
-
8/3/2019 New Armlect1
18/80
18 18
ARM
ARM Instruction Set Format
Instruction typeD ata processing /
P SR Transfer
Multiply
Long Multiply(v3M / v4 only)
Swap
Load/StoreByte/Word
Load/Store MultipleH alfword transfer :
Immediate offset (v4only)
H alfword transfer:Re ister offset v4 onl
Cond 0 0 I Opc ode S R n Rd Ope ra nd2
Cond 0 0 0 0 0 0 A S R d Rn Rs 1 0 0 1 Rm
Cond 0 0 0 1 0 B 0 0 R n Rd 0 0 0 0 1 0 0 1 Rm
Cond 0 1 I P U B W L R n Rd Offs et
Cond 1 0 0 P U S W L R n Regi s te r L i s t
Cond 0 0 0 0 1 U A S R dHi Rd Lo Rs 1 0 0 1 Rm
Cond 0 0 0 P U 1 W L R n Rd Offs et 1 1 S H 1 Offs et2
Cond 1 0 1 L Offs et
Cond 1 1 0 P U N W L R n CRd CPNum Offs et
Cond 1 1 1 0 Op1 CRn CRd CPNum Op 2 0 CRm
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op 2 1 CRm
Cond 1 1 1 1 SWI Numb e r
Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 R n
Cond 0 0 0 P U 0 W L R n Rd 0 0 0 0 1 S H 1 Rm
31 2827 1615 87 0
-
8/3/2019 New Armlect1
19/80
19 19
ARM
Conditional Execution
Most instruction sets only allow branches to beexecuted conditionally.However by reusing the condition evaluation hardware,
ARM effectively increases number of instructions. All instructions contain a condition field which determines
whether the CPU will execute them. N on-executed instructions soak up 1 cycle.
Still have to complete cycle so as to allow fetching and decodingof following instructions.
This removes the need for many branches, which stallthe pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional
-
8/3/2019 New Armlect1
20/80
20 20
ARM
The Condition Field
2831 24 20 16 12 8 4 0
Cond
0000 = EQ - Z s et (equa l)
0001 = NE - Z cl ear (no t equa l)
0010 = HS / C S - C set (unsigned higher o r sam e)
0011 = LO / CC - C clear (unsigned low er )
0100 = MI -N s et (negati ve)
0101 = PL - N cl ear (po sitive o r zer o )
0110 = VS - V set (o verf low )
0111 = VC - V clear (no o verf low )
1000 = HI - C set and Z clear (unsigned higher )
1001 = LS - C clear or Z (setunsigned lower or same)
1010 = GE - N set and V set,or N clear and V clear (>or =)
1011 = LT - N set and Vclear, or N clear and Vset (>)
1100 = GT - Z clear, andeither N set and V set,or N clear and V set(>)
-
8/3/2019 New Armlect1
21/80
21 21
ARM
Using and updating the ConditionField
To execute an instruction conditionally, simply postfix itwith the appropriate condition: For example an add instruction takes the form:
ADD r0,r1,r 2 ; r0 = r1 + r 2 (ADDAL) To execute this only if the zero flag is set:
ADDEQ r0,r1,r 2 ; If z e r o fla g s et then; ... r0 = r1 + r 2
By default, data processing operations do not affect thecondition flags (apart from the comparisons where thisis the only effect). To cause the condition flags to beupdated, the S bit of the instruction needs to be set bypostfixing the instruction (and any condition code) with
an S .
-
8/3/2019 New Armlect1
22/80
22 22
ARM
Branch : B{} labelBranch with Link : BL{}sub_routine_label
The offset for branch instructions is calculated bythe assembler:
By taking the difference between the branch instructionand the target address minus 8 (to allow for thepipeline).
This gives a 26 bit offset which is right shifted 2 bits (as
Branch instructions (1)
2831 24 0
Cond 1 0 1 L Offset
Condition field
L ink b it 0 = Bran ch1 = Bran ch w ith link
232527
-
8/3/2019 New Armlect1
23/80
23 23
ARM
Branch instructions (2)
W hen executing the instruction, the processor: shifts the offset left two bits, sign extends it to 32 bits, and adds
it to PC.E xecution then continues from the new PC, once thepipeline has been refilled.The "Branch with link" instruction implements asubroutine call by writing PC-4 into the LR of thecurrent bank.
i.e. the address of the next instruction following the branch withlink (allowing for the pipeline).
To return from subroutine, simply need to restore thePC from the LR:
MOV pc, lr
-
8/3/2019 New Armlect1
24/80
24 24
ARM
D ata processing Instructions
Largest family of ARM instructions, all sharing thesame instruction format.Contains: Arithmetic operations Comparisons (no results - just set condition codes) Logical operations Data movement between registers
Remember, this is a load / store architecture These instruction only work on registers, NOT NOT memory.
They each perform a specific operation on one or twooperands. First operand always a register - Rn
-
8/3/2019 New Armlect1
25/80
25 25
ARM
Arithmetic Operations
Operations are: ADD operand1 + operand2 ADC operand1 + operand2 + carry SUB operand1 - operand2 SBC operand1 - operand2 + carry -1 RSB operand2 - operand1 RSC operand2 - operand1 + carry - 1
Syntax: {}{S} Rd, Rn, Operand2
E xamples ADD r0, r1, r2
SUBGT r3, r3, #1 RSBL E S r4, r5, #5
-
8/3/2019 New Armlect1
26/80
26 26
ARM
Comparisons
The only effect of the comparisons is to UPDA TE THE CON DI T I ON FL AGS UPDA TE THE CON DI T I ON FL AGS. Thus no need to set S
bit.Operations are: CMP operand1 - operand2, but result not written CMN operand1 + operand2, but result not written TST operand1 A N D operand2, but result not written TE Q operand1 E OR operand2, but result not written
Syntax: {} Rn, Operand2
E xamples:
CMP r0, r1 TST E Q r2 #5
-
8/3/2019 New Armlect1
27/80
27 27
ARM
Logical Operations
Operations are: AN D operand1 A N D operand2 E OR operand1 E OR operand2 ORR operand1 OR operand2 BIC operand1 A N D N OT operand2 [ie bit clear]
Syntax: {}{S} Rd, Rn, Operand2
E xamples: AN D r0, r1, r2 E ORS r1,r3,r0
-
8/3/2019 New Armlect1
28/80
28 28
ARM
D ata Movement
Operations are: MOV operand2 MVN N OT operand2
N ote that these make no use of operand1.Syntax: {}{S} Rd, Operand2
E xamples: MOV r0, r1 MOVS r2, #10 MVNE Q r1,#0
-
8/3/2019 New Armlect1
29/80
29 29
ARM
Quiz #2Start
Stopr0 = r1?
r0 > r1?
r0 = r0 - r1 r1 = r1 - r0
Yes
No Yes
No
Convert the GCDalgorithm given in thisflowchart into
1) N ormal assembler,where only branchescan be conditional.
2)ARM assembler,
where all instructionsare conditional, thusimproving codedensity.
The only instructionsyou need are CMP, B
-
8/3/2019 New Armlect1
30/80
30 30
ARM
Quiz #2 - Sample Solutions
ARM Conditional Assembler
g c d cmp r0, r1 ;i f r0 > r1sub gt r0, r0, r1 ; sub t rac t r1
fr o m r0subl t r1, r1, r0 ;e ls e sub t rac t
r0 fr o m r1b ne g c d ; r e ac hed the
end?
-
8/3/2019 New Armlect1
31/80
31 31
ARM
The Barrel Shifter
The ARM doesnt have actual shift instructions.
Instead it has a barrel shifter which provides amechanism to carry out shifts as part of other instructions.
So what operations does the barrel shifter support?
-
8/3/2019 New Armlect1
32/80
32 32
ARM
Shifts left by the specified amount (multiplies
by powers of two) e.g.LSL #5 = multiply by 32
Barrel Shifter - Left Shift
L ogical Shift L eft ( L SL )
DestinationCF 0
-
8/3/2019 New Armlect1
33/80
33 33
ARM
Logical Shift RightShifts right bythe specifiedamount (dividesby powers of two)e.g.
LSR #5 = divideby 32
Arithmetic ShiftRightShifts right
(divides byowers of two
Barrel Shifter - Right Shifts
Destination CF
Destination CF
L ogical Shift R ight
A rithmetic Shift R ight
...0
Sign b it shifted in
-
8/3/2019 New Armlect1
34/80
34 34
ARM
Barrel Shifter - RotationsRotate Right (ROR)Similar to an ASR butthe bits wrap aroundas they leave the LSB
and appear as theMSB.
e.g. ROR #5N ote the last bitrotated is also usedas the Carry Out.
Rotate Right E xtended(RR X )
This operation uses
Destination CF
R otate R ight
Destination CF
R otate R ight through Carry
-
8/3/2019 New Armlect1
35/80
35 35
ARM
Using the Barrel Shifter:The Second Operand
* Immediate value8 bit num ber C an be r o tated right thr o ugh an even num ber o f
po sitio ns.Ass em bler w ill calculate r o tate f o r yo u fr om co nstant .
Register, optionallywith shift operationapplied.Shift value can be
either be: 5 bit unsigned integer Specified in bottom
byte of another register.
Operand1
Result
ALU
BarrelShifter
Operand2
-
8/3/2019 New Armlect1
36/80
36 36
ARM
Second Operand :Shifted Register
The amount by which the register is to be shiftedis contained in either: the immediate 5-bit field in the instruction
NO OVE RH E ADNO OVE RH E ADShift is done for free - executes in single cycle.
the bottom byte of a register (not PC)Then takes extra cycle to execute
ARM doesnt have enough read ports to read 3 registersat once.Then same as on other processors where shift isseparate instruction.
If no shift is specified then a default shift isapplied: LSL #0 i.e. barrel shifter has no effect on value in register.
-
8/3/2019 New Armlect1
37/80
37 37
ARM
Second Operand :Using a Shifted Register
Using a multiplication instruction to multiply by aconstant means first loading the constant into aregister and then waiting a number of internal cyclesfor the instruction to complete.
A more optimum solution can often be found by usingsome combination of MOVs, ADDs, SUBs and RSBswith shifts. Multiplications by a constant equal to a ((power of 2) 1) can
be done in one cycle.E xample: r0 = r1 * 5E xample: r0 = r1 + (r1 * 4)
ADD r0, r1, r1, LSL #2E xam le: r2 = r3 * 105
-
8/3/2019 New Armlect1
38/80
38 38
ARM
Second Operand :Immediate Value (1)
There is no single instruction which will load a 32 bitimmediate constant into a register without performing adata load from memory. All ARM instructions are 32 bits long ARM instructions do not use the instruction stream as data.
The data processing instruction format has 12 bitsavailable for operand2 If used directly this would only give a range of 4096.
Instead it is used to store 8 bit constants, giving arange of 0 - 255.These 8 bits can then be rotated right through an even
number of positions (ie RORs by 0, 2, 4,..30). This ives a much lar er ran e of constants that can be
-
8/3/2019 New Armlect1
39/80
39 39
ARM
Second Operand :Immediate Value (2)
This gives us: 0 - 255 [0 - 0xff] 256,260,264,..,1020 [0x100-0x3fc, step 4, 0x40-
0xff ror 30] 1024,1040,1056,..,4080 [0x400-0xff0, step 16, 0x40-0xff ror
28] 4096,4160, 4224,..,16320 [0x1000-0x3fc0, step 64, 0x40-0xff
ror 26]These can be loaded using, for example:
MOV r0, #0x40, 26 ; => MOV r0, #0x1000 (ie4096)
To make this easier, the assembler will convert to this
form for us if simply given the required constant:=
-
8/3/2019 New Armlect1
40/80
40 40
ARM
Loading full 32 bit constants
Although the MOV/MV N mechansim will load a largerange of constants into a register, sometimes thismechansim will not generate the required constant.Therefore, the assembler also provides a methodwhich will load ANY ANY 32 bit constant:
LDR r d ,= n ume r i c c on s t a nt
If the constant can be constructed using either a MOVor MV N then this will be the instruction actuallygenerated.Otherwise, the assembler will produce an LDRinstruction with a PC-relative address to read the
constant from a literal pool. LDR r0,=0x4 2 ; gene ra te s MOV r0,#0x4 2
-
8/3/2019 New Armlect1
41/80
41 41
ARM
Multiplication Instructions
The Basic ARM provides two multiplicationinstructions.Multiply MUL{}{S} Rd, Rm, Rs ; Rd = Rm * Rs
Multiply Accumulate - does addition for free MLA{}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn
Restrictions on use: Rd and Rm cannot be the same register
Can be avoid by swapping Rm and Rs around. This worksbecause multiplication is commutative.
Cannot use PC.
These will be picked up by the assembler if overlooked.
-
8/3/2019 New Armlect1
42/80
42 42
ARM
Multiplication Implementation
The ARM makes use of Booths Algorithm to performinteger multiplication.On non-M ARMs this operates on 2 bits of Rs at a time.
For each pair of bits this takes 1 cycle (plus 1 cycle to startwith).
However when there are no more 1s left in Rs, themultiplication will early-terminate.
E xample: Multiply 18 and -1 : Rd = Rm * Rs
0 0 0 0 0 0 1 00 0 0 10 0 0 00 0 0 00 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1
R m
R s
17 cycles
R s
R m4 cycles
18
-1
18
-1
-
8/3/2019 New Armlect1
43/80
43 43
ARM
Extended Multiply Instructions
M variants of ARM cores contain extendedmultiplication hardware. This provides threeenhancements:
An 8 bit Booths Algorithm8 bit Booths Algorithm is usedMultiplication is carried out faster (maximum for standardinstructions is now 5 cycles).
E arly termination method improved E arly termination method improved so that now completes
multiplication when all remaining bit sets containall zeroes (as with non-M ARMs), or all ones.
Thus the previous example would early terminate in 2cycles in both cases.
64 bit results64 bit results can now be produced from two 32bitoperands
-
8/3/2019 New Armlect1
44/80
44 44
ARM
Multiply-Long andMultiply-Accumulate Long
Instructions are MULL which gives RdHi,RdLo:=Rm*Rs MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
However the full 64 bit of the result now matter (lower precision multiply instructions simply throws top 32bitsaway)
N eed to specify whether operands are signed or unsigned
Therefore syntax of new instructions are: UMULL{}{S} RdLo,RdHi,Rm,Rs UMLAL{}{S} RdLo,RdHi,Rm,Rs SMULL{}{S} RdLo, RdHi, Rm, Rs SMLAL{}{S} RdLo, RdHi, Rm, Rs
-
8/3/2019 New Armlect1
45/80
45 45
ARM
Quiz #31. Specify instructions which will implement the following:a) r0 = 16 b) r1 = r0 * 4
c) r0 = r1 / 16 ( r1 signed 2's comp.) d) r1 = r2 * 7
2. W hat will the following instructions do?a) ADDS r0, r1, r1, LSL #2 b) RSB r2, r1, #0
3. W hat does the following instruction sequence do? ADD r0, r1, r1, LSL #1SUB r0, r0, r1, LSL #4
ADD r0, r0, r1, LSL #7
-
8/3/2019 New Armlect1
46/80
46 46
ARM
Load / Store Instructions
The ARM is a Load / Store Architecture: Does not support memory to memory data processing
operations. Must move data values into registers before using them.
This might sound inefficient, but in practice isnt: Load data values from memory into registers. Process data in registers using a number of data
processing instructions which are not slowed down bymemory access.
Store results from registers out to memory.
The ARM has three sets of instructions whichinteract with main memory. These are: Single register data transfer (LDR / STR).
-
8/3/2019 New Armlect1
47/80
47 47
ARM
Single register data transfer
The basic load and store instructions are: Load and Store W ord or Byte
LDR / STR / LDRB / STRB
ARM Architecture Version 4 also adds support for halfwords and signed data.
Load and Store HalfwordLDRH / STRH
Load Signed Byte or Halfword - load value and sign extend it to32 bits.
LDRSB / LDRSH
All of these instructions can be conditionally executedby inserting the appropriate condition code after STR /LDR.
-
8/3/2019 New Armlect1
48/80
48 48
ARM
Load and Store Word or Byte:Base Register
The memory location to be accessed is held in
a base register STR r0, [r1] ; Store contents of r0 to location
pointed to; by contents of r1.
LDR r2, [r1] ; Load r2 with contents of memorylocation
; pointed to by contents of r1.r1
0x200Base
Register
Memory
0x50x200
r0
0x5Source
Register for STR
r2
0x5D estination
Register for L D R
-
8/3/2019 New Armlect1
49/80
49 49
ARM
Load and Store Word or Byte:Offsets from the Base Register
As well as accessing the actual location contained inthe base register, these instructions can access alocation offset from the base register pointer.This offset can be
An unsigned 12bit immediate value (ie 0 - 4095 bytes). A register, optionally shifted by an immediate value
This can be either added or subtracted from thebase register: Prefix the offset value or register with + (default) or -.
This offset can be applied: before the transfer is made: PrePre- -indexed a ddressing indexed a ddressing
optionally autoauto--incrementing incrementing the base register, by postfixingthe instruction with an !.
-
8/3/2019 New Armlect1
50/80
50 50
ARM
Load and Store Word or Byte:P re-indexed Addressing
E xample: STR r0, [r1,#12]
To store to location 0x1f4 instead use: STR r0, [r1,#-
12]
r1
0x200Base
Register
Memory
0x5
0x200
r0
0x5Source
Register for STR
Offset
12 0x20c
-
8/3/2019 New Armlect1
51/80
51 51
ARM
Load and Store Word or Byte:P ost-indexed Addressing
E xample: STR r0, [r1], #12
To auto-increment the base register to location 0x1f4instead use:
STR r0, [r1], #-12
r1
0x200Original
BaseRegister
Memory
0x50x200
r0
0x5Source
Register for STR
Offset
12 0x20c
r1
0x20c Updated
BaseRegister
-
8/3/2019 New Armlect1
52/80
52 52
ARM
Load and Storeswith User Mode P rivilege
W hen using post-indexed addressing, there is a further form of Load/Store W ord/Byte: {}{B} T Rd,
W hen used in a privileged mode, this does the
load/store with user mode privilege. N ormally used by an exception handler that is emulating amemory access instruction that would normally execute in user mode.
-
8/3/2019 New Armlect1
53/80
53 53
ARM
Example Usage of Addressing Modes
Imagine an array, the first element of which is pointedto by the contents of r0.If we want to access a particular element,then we can use pre-indexed addressing:
r1 is element we want. LDR r2, [r0, r1, LSL #2]
If we want to step through everyelement of the array, for instanceto produce sum of elements in thearray, then we can use post-indexed addressing withina loop:
r1 is address of current element (initially equal to r0).
0
1
2
3
element
0
4
8
12
MemoryOffset
r0
Pointer tostart of array
-
8/3/2019 New Armlect1
54/80
54 54
ARM
Offsets for H alfword and SignedH alfword / Byte Access
The Load and Store Halfword and Load Signed Byte or Halfword instructions can make use of pre- and post-indexed addressing in much the same way as thebasic load and store instructions.However the actual offset formats are moreconstrained: The immediate value is limited to 8 bits (rather than 12 bits)
giving an offset of 0-255 bytes. The register form cannot have a shift applied to it.
ARM
-
8/3/2019 New Armlect1
55/80
55 55
ARM
Effect of endianess
The ARM can be set up to access its data in either little or big endian format.Little endian: Least significant byte of a word is stored in b its 0 b its 0 - -7 7 of an
addressed word.
Big endian: Least significant byte of a word is stored in b its 24b its 24 - -31 31 of
an addressed word.This has no real relevance unless data is stored aswords and then accessed in smaller sizedquantities (halfwords or bytes). W hich byte / halfword is accessed will depend on the
endianess of the system involved.
ARM
-
8/3/2019 New Armlect1
56/80
56 56
ARM
Endianess Example
Big-endianL ittle-endian
r1 = 0x100
r 0 = 0x11 22334431 24 23 16 15 8 7 0
11 22 33 44
31 24 23 16 15 8 7 0
11 22 33 44
31 24 23 16 15 8 7 0
44 33 22 11
31 24 23 16 15 8 7 0
00 00 00 44
31 24 23 16 15 8 7 0
00 00 00 11
r2 = 0x 44 r2 = 0x11
STR r0, [r1]
LD RB r2, [r1]
r1 = 0x100Memo ry
ARM
-
8/3/2019 New Armlect1
57/80
57 57
ARM
Quiz #4
W rite a segment of code that add together elements xto x+(n-1) of an array, where the element x=0 is thefirst element of the array.E ach element of the array is word sized (ie. 32 bits).The segment should use post-indexed addressing.
At the start of your segments, you should assume that: r0 points to the start of the array. r1 = x r2 = n r0
xx + 1
x + (n - 1)
Elements
{n elements
0
ARM
-
8/3/2019 New Armlect1
58/80
58 58
ARM
Quiz #4 - Sample SolutionADD r0, r0, r1, LSL# 2 ; Set r0 to a dd r e ss o fe l e ment xADD r 2 , r0, r 2 , LSL# 2 ; Set r 2 to a dd r e ss o fe l e ment n +1MOV r1, #0 ; I niti al i s e c o u nte r
l oo pLDR r3, [r0], #4 ; Acc e ss e l e ment a nd move to ne x t
ADD r1, r1, r3 ; Add c ontent s to c o u nte rCMP r0, r 2 ; Ha ve we r e ac hed e l e ment x+ n?
BLT l oo p ; If not - r e p e a t f o r
; ne x t e l e ment
ARM
-
8/3/2019 New Armlect1
59/80
59 59
ARM
BlockD
ata Transfer (1)
Cond 1 0 0 P U S W L R n R egister list
Condition field Base registerL oad/Store b it0 = S to re to m emo ry1 = L o ad fr om m emo ry
Write- b ack b it0 = no w rite - back 1 = w rite addre ss into base
PS R and force user b it0 = do nt lo ad PSR o r f o r ce user mo de1 = l o ad PSR o r f o r ce user mo de
U p/Down b it0 = D ow n; subtra ct o ff set fr om base1 = Up ; add o ff set to base
Pre/Post indexing b it0 = P o st; add o ff set after tran sfer ,1 = P re ; add o ff set bef o re tran sfer
2831 22 16 023 21 1527 20 1924
Each b it corresponds to a particular
register. For example:Bit 0 set causes r 0 to be tran sferred .Bit 0 unset causes r 0 no t to be tran sferred .
A t least one register must b etransferred as the list cannot b e empty.
The Load and Store Multiple instructions (LDM
/ STM) allow betweeen 1 and 16 registers to betransferred to or from memory.The transferred registers can be either: Any subset of the current bank of registers (default).
Any subset of the user mode bank of registers whenin a priviledged mode (postfix instruction with a ^ ).
ARM
-
8/3/2019 New Armlect1
60/80
60 60
ARM
BlockD
ata Transfer (2)
Base register used to determine where memory accessshould occur. 4 different addressing modes allow increment and decrement
inclusive or exclusive of the base register location. Base register can be optionally updated following the transfer
(by appending it with an !!. Lowest register number is always transferred to/from lowest
memory location accessed.
These instructions are very efficient for Saving and restoring context
For this useful to view memory as a stack. Moving large blocks of data around memory
For this useful to directly represent functionality of theinstructions.
ARM
-
8/3/2019 New Armlect1
61/80
61 61
ARM
Stacks
A stack is an area of memory which grows asnew data is pushed onto the top of it, andshrinks as data is popped off the top.Two pointers define the current limits of thestack. A base pointer
used to point to the bottom of the stack (the first location).
A stack pointer used to point the current top of the stack.
SP
BA SE
PUSH {1,2,3}
1
23
BA SE
SP
POP
1
2Result of pop = 3
BA SE
SP
ARM
-
8/3/2019 New Armlect1
62/80
62 62
ARM
Stack Operation
Traditionally, a stack grows down in memory, with thelast pushed value at the lowest address. The ARMalso supports ascending stacks, where the stackstructure grows up through memory.The value of the stack pointer can either:
Point to the last occupied address (Full stack)and so needs pre-decrementing (ie before the push)
Point to the next occupied address ( E mpty stack)and so needs post-decrementing (ie after the push)
The stack type to be used is given by the postfix to theinstruction:
STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack.
ARM
-
8/3/2019 New Armlect1
63/80
63 63
ARM
Stack ExamplesSTMFD sp!,{r0,r1,r3-r5}
r5r4r3
r1r0SP
Old SP
STMED sp!,{r0,r1,r3-r5}
r5r4r3r1
r0SP
Old SP
r5r4r3
r1r0
STMF A sp!,{r0,r1,r3-r5}
SP
Old SP 0x400
0x418
0x3e8
STME A sp!,{r0,r1,r3-r5}
r5r4
r3r1r0
SP
Old SP
-
8/3/2019 New Armlect1
64/80
ARM
-
8/3/2019 New Armlect1
65/80
65 65
ARM
D irect functionality of Block D ata Transfer
W hen LDM / STM are not being used to implementstacks, it is clearer to specify exactly what functionalityof the instruction is: i.e. specify whether to increment / decrement the base pointer,
before or after the memory access.
In order to do this, LDM / STM support a further syntaxin addition to the stack one: STMIA / LDMIA : Increment After STMIB / LDMIB : Increment Before STMDA / LDMDA : Decrement After STMDB / LDMDB : Decrement Before
ARM
-
8/3/2019 New Armlect1
66/80
66 66
ARM
Example: Block Copy Copy a block of memory, which is an exact multiple
of 12 words long from the location pointed to by r12to the location pointed to by r13. r14 points to the endof block to be copied.
; r12 points to the start of the sourcedata
; r14 points to the end of the source data
; r13 points to the start of thedestination dataloop L DMI Ar12!, {r0-r11} ; load 48 bytes
STMI Ar13!, {r0-r11} ; and store them C MP r12, r14 ; check for the end
r13
r14
r12
IncreasingM emory
ARM
-
8/3/2019 New Armlect1
67/80
67 67
ARM
Quiz #5
The contents of registers r0 to r6 need to be swappedaround thus: r0 moved into r3 r1 moved into r4 r2 moved into r6 r3 moved into r5 r4 moved into r0 r5 moved into r1 r6 moved into r2
W rite a segment of code that uses full descendingstack operations to carry this out, and hence requiresno use of any other registers for temporary storage.
ARM
-
8/3/2019 New Armlect1
68/80
68 68
ARM
Quiz #5 - Sample SolutionSTMFD sp!,
{r0-r6}LDMFD sp!,{r3,r4,r6}
r3 = r0r4 = r1r6 = r2
LDMFD sp!,{r5}
r5 = r3
LDMFD sp!,{r0-r2}
r0 = r4r1 = r5r2 = r6
Old SP
r5r4r3r2r1
SP
r6
r0
r5r4
SP
r6
r3
r5SP
r6
r4
SP
ARM
-
8/3/2019 New Armlect1
69/80
69 69
ARM
Atomic operation of a memory read followed by amemory write which moves byte or wordquantities between registers and memory.Syntax: S W P{}{B} Rd, Rm, [Rn]
Swap and Swap Byte Instructions
Rm Rd
Rn
32
1temp
Memory
ARM
-
8/3/2019 New Armlect1
70/80
70 70
ARM
Software Interrupt (SWI)
In effect, a S W I is a user-defined instruction.It causes an exception trap to the S W I hardware vector (thus causing a change to supervisor mode, plus theassociated state saving), thus causing the S W Iexception handler to be called.The handler can then examine the comment field of theinstruction to decide what operation has beenrequested.B makin use of the S W I mechansim, an o eratin
2831 2427 0
Cond 1 1 1 1 Comment field (ignored b y Processor)
Condition Field
23
ARM
-
8/3/2019 New Armlect1
71/80
71 71
ARM
PSR Transfer Instructions
MRS and MSR allow contents of CPSR/SPSR to betransferred from appropriate status register to a general
purpose register. All of status register, or just the flags, can be transferred.
Syntax: MRS{} R d , ; Rd = MSR{} ,Rm ; = Rm MSR{} ,Rm ; = Rm
where = CPSR, CPSR_all, SPSR o r SPSR_all = CPSR_fl g o r SPSR_fl g
Also an immediate form < > < >
ARM
-
8/3/2019 New Armlect1
72/80
72 72
ARM
Using MRS and MSR
Currently reserved bits, may be used in future,therefore:
they must be preserved when altering PSR the value they return must not be relied upon when testing
other bits.
Thus read-modify-write strategy must be followed whenmodifying any PSR:
Transfer PSR to register using MRS
Modify relevant bits
M odeN Z C V
2831 8 4 0
I F T
ARM
-
8/3/2019 New Armlect1
73/80
73 73
ARM
Coprocessors
The ARM architecture supports 16 coprocessorsE ach coprocessor instruction set occupies part of the
ARM instruction set.There are three types of coprocessor instruction Coprocessor data processing Coprocessor (to/from ARM) register transfers Coprocessor memory transfers (load and store to/from
memory) Assembler macros can be used to transform customcoprocessor mneumonics into the generic mneumonicsunderstood by the processor.
A coprocessor may be implemented
ARM
-
8/3/2019 New Armlect1
74/80
74 74
ARM
Coprocessor D
ataP
rocessing
This instruction initiates a coprocessor operationThe operation is performed only on internalcoprocessor state
For example, a Floating point multiply, which multiplies thecontents of two registers and stores the result in a third register
Syntax: CDP{}
,< o pc_1>, CRd , CRn , CRm,{< o pc_ 2 >}
31 28 27 26 2 5 2 4 2 3 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0
Destination Register Source Registers
Opcode
Condition Code Specifier Opcode
Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm
ARM
-
8/3/2019 New Armlect1
75/80
75 75
Coprocessor Register Transfers
These two instructions move data between ARMregisters and coprocessor registers MRC : Move to Register from Coprocessor MCR : Move to Coprocessor from Register
An operation may also be performed on the data as itis transferred For example a Floating Point Convert to Integer instruction can
be implemented as a register transfer to ARM that alsoconverts the data from floating point format to integer format.
Syntax {}
,< o pc_1>,R d , CRn , CRm,< o pc_ 2 >
31 28 27 26 2 5 2 4 2 3 22 2 1 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0
ARM Source/Dest Register Coprocesor Source/Dest Registers
Opcode
Condition Code Specifier Opcode
Transfer To/From Coprocessor
Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm
-
8/3/2019 New Armlect1
76/80
ARM
-
8/3/2019 New Armlect1
77/80
77 77
Coprocessor Memory Transfers (2)
Syntax of these is similar to word transfers between ARM and memory:
{}{} , CRd ,PC relative offset generated if possible, else causes an error.
{}{}
, CRd ,Pre-indexed form, with optional writeback of the base register
{}{} , CRd ,Post-indexed form
where when present causes a long transfer to be performed
(N =1) else causes a short transfer to be performed ( N =0).
ARM
-
8/3/2019 New Armlect1
78/80
78 78
Quiz #6
W rite a short code segment that performs a modechange by modifying the contents of the CPSR The mode you should change to is user mode which has the
value 0x10. This assumes that the current mode is a priveleged mode such
as supervisor mode. This would happen for instance when the processor is reset -
reset code would be run in supervisor mode which would thenneed to switch to user mode before calling the main routine inyour application.
You will need to use MSR and MRS, plus 2 logical operations.M odeN Z C V2831 8 4 0
I F T
ARM
-
8/3/2019 New Armlect1
79/80
79 79
Quiz #6 - Sample Solution
Set up useful constants:
mmask EQU 0x1f ; mask to clear mode bits
userm EQU 0x10 ; user mode value
Start off here in supervisor mode.MRS r0, cpsr ; t ak e a c o py o f the CPSRBI C r0,r0,#mmask ; cl e ar the mode b it sORR r0,r0,#us e rm ; s e l e c t ne w m odeMSR cpsr, r0 ; wr ite back the modi f ied
; CPSR
ARM
-
8/3/2019 New Armlect1
80/80
80 80
Main features of theARM Instruction Set
All instructions are 32 bits long.Most instructions execute in a single cycle.E very instruction can be conditionally executed.
A load/store architecture Data processing instructions act only on registers
Three operand formatCombined ALU and shifter for high speed bit manipulation
Specific memory access instructions with powerful auto-indexing addressing modes.
32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.
Flexible multiple register load and store instructions