New Armlect1

8/3/2019 New Armlect1

1/80

1 1

ARM

The ARM Instruction SetARMAdvanced RISC Machines


2/80

2 2

ARM

P rocessor Modes

The ARM has six operating modes: User User (unprivileged mode under which most tasks run) (10000) FIQFIQ (entered when a high priority (fast) interrupt is raised)

(10001) IRQIRQ (entered when a low priority (normal) interrupt is raised)

(10010) Supervisor Supervisor (entered on reset and when a Software Interrupt

instruction is executed) (10011) Abort Abort (used to handle memory access violations) (10111) Undef Undef (used to handle undefined instructions) (11011)

ARM Architecture Version 4 adds a seventh mode: SystemSystem (privileged mode using the same registers as user

mode) (11111)


3/80

3 3

ARM

ARM has 37 registers in total, all of which are 32-bitslong.

1 dedicated program counter 1 dedicated current program status register

5 dedicated saved program status registers 30 general purpose registers

H owever these are arranged into several banks, with theaccessible bank being governed by the processor mode.Each mode can access

a particular set of r0-r12 registers a particular r13 (the stack pointer) and r14 (link register) r15 (the program counter) cpsr (the current program status register)

and privileged modes can also access a particular spsr (saved program status register)

The Registers


4/80


5/80

5 5

ARM

Accessing Registers usingARM Instructions

N o breakdown of currently accessible registers. All instructions can access r0-r14 directly. Most instructions also allow use of the PC.

Specific instructions to allow access to CPSRand SPSR.


6/80

6 6

ARM

The P rogram Status Registers(C P SR and S P SRs)

Co pies o f the ALU s tatus f lags (l atched if theinstructio n has the "S" bit set).

N = N egati ve result fr om ALU f lag.Z = Z er o result fr om ALU f lag.C = ALU o perati o n C arried o utV = ALU o perati o n o Verf low ed

* Interrupt Disa b le b its.I = 1, disables the IRQ.F = 1, disables the FIQ.

* T Bit ( A rchitecture v4T only)T = 0, P r o cesso r in ARM s tateT = 1, P r o cesso r in Thum b state

* Condition Code Flags

M odeN Z C V

2831 8 4 0

I F T

* M ode BitsM [4:0] define the pr o cesso r mo de.


7/80

7 7

ARM

L ogical Instruction A rithmetic Instruction

Flag

Negati ve No m eaning Bit 31 o f the result has been set(N=1) Indicates a negati ve num ber in

signed o perati o ns

Zer o R esult is all z er o es R esult o f o perati o n w aszer o(Z=1)

C arry After Shift o perati o n R esult w as greater than 32 bits(C =1) 1w as left in carry f lag

o Verf low No m eaning R esult w as greater than 31 bits(V=1) Indicates a po ss ible

co rru ptio n o f the sign bit in signed

Condition Flags


8/80

8 8

ARM

W hen the processor is executing in ARM state: All instructions are 32 bits in length All instructions must be word aligned Therefore the PC value is stored in bits [31:2] with bits [1:0]

equal to zero (as instruction cannot be halfword or bytealigned).

R14 is used as the subroutine link register (LR) andstores the return address when Branch with Linkoperations are performed,calculated from the PC.Thus to return from a linked branch

MOV r15,r14

or

The P rogram Counter (R15)


9/80

9 9

ARM

Register Example:User to FIQ Mode

spsr_fiq

cpsr

r7

r4r 5

r2r 1

r 0

r3

r6

r 15 (pc)r 14_fiq

r 13_fiq

r 12_fiq

r 10 _fiq

r 11 _fiq

r9_fiq

r8_fiq

r 14 (lr )

r 13 (sp)

r 12

r 10

r 11

r9

r8

U ser mode CPS R copied to FIQ mode SPS R

cpsr

r 15 (pc)r 14 (lr )

r 13 (sp)

r 12

r 10

r 11

r9

r8

r7

r4r 5

r2r 1

r 0

r3

r6

r 14_fiq

r 13_fiq

r 12_fiq

r 10 _fiq

r 11 _fiq

r9_fiq

r8_fiq

R eturn address calculated from U ser modePC value and stored in FIQ mode LR

R egisters in use R egisters in use

EXCEPTIO N

U ser M ode FIQ M ode

spsr_fiq

D isable FIQ


10/80

10 10

ARM

Exceptions

E xceptions generated as the direct effect of executing

an instruction Software Interrupts, Undefined Instructions & Prefetch AbortsE xceptions generated as a side effect of an Instruction Data Aborts (Caused by Load/Store Instructions)

E xceptions generated externally, unrelated toInstruction flow Reset, IRQ, FIQ


11/80

11 11

ARM

Exception Entry

Changes Operating Mode

Save Address of next Instruction in r14 of thenew modeSaves Old value of CPSR into SPSR of newmode

Disables either IRQ or FIQ if the exception isIRQ or FIQ respectivelyForces PC to vector to new address


12/80

12 12

ARM

Exception Return

Any modified user registers should be restored

from the StackThe CPSR should be restored from appropriateSPSRThe PC must be changed to relevant User Instruction Stream

P roblem: Last two cannot be carried outindependently.


13/80

13 13

ARM

Solution

To return from S W I MOVS pc, r14To return from IRQ, FIQ or Prefetch Abort SUBS pc, r14, #4

To return from Data Aborts SUBS pc, r14, #8

The S modifier signifies special form of Instruction when the destination is P C

This way only if S P SR, r14 stored onto the Stack:

LD MFD r13!, {r0-r3,pc} ^


14/80

14 14

ARM

P riority

Reset

Data AbortFIQIRQPrefetch Abort

S W I, Undefined Instruction

There should be some doubt


15/80

15 15

ARM

W hen an exception occurs, thecore: Copies CPSR into SPSR_ Sets appropriate CPSR bits

If core implements ARM Architecture 4Tand is currently in Thumb state, then

ARM state is entered.Mode field bits

Interrupt disable flags if appropriate. Maps in appropriate banked registers Stores the return address in

LR_ Sets PC to vector address

To return, exception handler

Exception H andlingand the Vector Table

0x00000000

0x0000001 C

0x0000001 8

0x0000001 4

0x00000010

0x0000000 C

0x0000000 8

0x0000000 4

R eset

U ndefined Instruction

FIQ

IR Q

R eserved

Data Ab ort

Prefetch Ab ort

Software Interrupt


16/80

16 16

ARM

The Instruction P ipeline

The ARM uses a pipeline in order to increase thespeed of the flow of instructions to the processor.

Allows several operations to be undertaken simultaneously,rather than serially.FE TCH

DE COD E

EXE CUT E

Instruction fetched from memory

Decoding of registers used in instruction

Register(s) read from Register BankShift and ALU operationW rite register(s) back to Register Bank

PC

PC - 4

PC - 8

ARM


17/80

17 17

ARM

Quiz #1 - Verbal

W hat registers are used to store the program counter

and link register?W hat is r13 often used to store?W hich mode, or modes has the fewest availablenumber of registers available? How many and why?N

ame the exceptions in the ARM?Mention their priorities.W hat happens on E xception E ntry & E xception E xit?


18/80

18 18

ARM

ARM Instruction Set Format

Instruction typeD ata processing /

P SR Transfer

Multiply

Long Multiply(v3M / v4 only)

Swap

Load/StoreByte/Word

Load/Store MultipleH alfword transfer :

Immediate offset (v4only)

H alfword transfer:Re ister offset v4 onl

Cond 0 0 I Opc ode S R n Rd Ope ra nd2

Cond 0 0 0 0 0 0 A S R d Rn Rs 1 0 0 1 Rm

Cond 0 0 0 1 0 B 0 0 R n Rd 0 0 0 0 1 0 0 1 Rm

Cond 0 1 I P U B W L R n Rd Offs et

Cond 1 0 0 P U S W L R n Regi s te r L i s t

Cond 0 0 0 0 1 U A S R dHi Rd Lo Rs 1 0 0 1 Rm

Cond 0 0 0 P U 1 W L R n Rd Offs et 1 1 S H 1 Offs et2

Cond 1 0 1 L Offs et

Cond 1 1 0 P U N W L R n CRd CPNum Offs et

Cond 1 1 1 0 Op1 CRn CRd CPNum Op 2 0 CRm

Cond 1 1 1 0 Op1 L CRn Rd CPNum Op 2 1 CRm

Cond 1 1 1 1 SWI Numb e r

Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 R n

Cond 0 0 0 P U 0 W L R n Rd 0 0 0 0 1 S H 1 Rm

31 2827 1615 87 0


19/80

19 19

ARM

Conditional Execution

Most instruction sets only allow branches to beexecuted conditionally.However by reusing the condition evaluation hardware,

ARM effectively increases number of instructions. All instructions contain a condition field which determines

whether the CPU will execute them. N on-executed instructions soak up 1 cycle.

Still have to complete cycle so as to allow fetching and decodingof following instructions.

This removes the need for many branches, which stallthe pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional


20/80

20 20

ARM

The Condition Field

2831 24 20 16 12 8 4 0

Cond

0000 = EQ - Z s et (equa l)

0001 = NE - Z cl ear (no t equa l)

0010 = HS / C S - C set (unsigned higher o r sam e)

0011 = LO / CC - C clear (unsigned low er )

0100 = MI -N s et (negati ve)

0101 = PL - N cl ear (po sitive o r zer o )

0110 = VS - V set (o verf low )

0111 = VC - V clear (no o verf low )

1000 = HI - C set and Z clear (unsigned higher )

1001 = LS - C clear or Z (setunsigned lower or same)

1010 = GE - N set and V set,or N clear and V clear (>or =)

1011 = LT - N set and Vclear, or N clear and Vset (>)

1100 = GT - Z clear, andeither N set and V set,or N clear and V set(>)


21/80

21 21

ARM

Using and updating the ConditionField

To execute an instruction conditionally, simply postfix itwith the appropriate condition: For example an add instruction takes the form:

ADD r0,r1,r 2 ; r0 = r1 + r 2 (ADDAL) To execute this only if the zero flag is set:

ADDEQ r0,r1,r 2 ; If z e r o fla g s et then; ... r0 = r1 + r 2

By default, data processing operations do not affect thecondition flags (apart from the comparisons where thisis the only effect). To cause the condition flags to beupdated, the S bit of the instruction needs to be set bypostfixing the instruction (and any condition code) with

an S .


22/80

22 22

ARM

Branch : B{} labelBranch with Link : BL{}sub_routine_label

The offset for branch instructions is calculated bythe assembler:

By taking the difference between the branch instructionand the target address minus 8 (to allow for thepipeline).

This gives a 26 bit offset which is right shifted 2 bits (as

Branch instructions (1)

2831 24 0

Cond 1 0 1 L Offset

Condition field

L ink b it 0 = Bran ch1 = Bran ch w ith link

232527


23/80

23 23

ARM

Branch instructions (2)

W hen executing the instruction, the processor: shifts the offset left two bits, sign extends it to 32 bits, and adds

it to PC.E xecution then continues from the new PC, once thepipeline has been refilled.The "Branch with link" instruction implements asubroutine call by writing PC-4 into the LR of thecurrent bank.

i.e. the address of the next instruction following the branch withlink (allowing for the pipeline).

To return from subroutine, simply need to restore thePC from the LR:

MOV pc, lr


24/80

24 24

ARM

D ata processing Instructions

Largest family of ARM instructions, all sharing thesame instruction format.Contains: Arithmetic operations Comparisons (no results - just set condition codes) Logical operations Data movement between registers

Remember, this is a load / store architecture These instruction only work on registers, NOT NOT memory.

They each perform a specific operation on one or twooperands. First operand always a register - Rn


25/80

25 25

ARM

Arithmetic Operations

Operations are: ADD operand1 + operand2 ADC operand1 + operand2 + carry SUB operand1 - operand2 SBC operand1 - operand2 + carry -1 RSB operand2 - operand1 RSC operand2 - operand1 + carry - 1

Syntax: {}{S} Rd, Rn, Operand2

E xamples ADD r0, r1, r2

SUBGT r3, r3, #1 RSBL E S r4, r5, #5


26/80

26 26

ARM

Comparisons

The only effect of the comparisons is to UPDA TE THE CON DI T I ON FL AGS UPDA TE THE CON DI T I ON FL AGS. Thus no need to set S

bit.Operations are: CMP operand1 - operand2, but result not written CMN operand1 + operand2, but result not written TST operand1 A N D operand2, but result not written TE Q operand1 E OR operand2, but result not written

Syntax: {} Rn, Operand2

E xamples:

CMP r0, r1 TST E Q r2 #5


27/80

27 27

ARM

Logical Operations

Operations are: AN D operand1 A N D operand2 E OR operand1 E OR operand2 ORR operand1 OR operand2 BIC operand1 A N D N OT operand2 [ie bit clear]

Syntax: {}{S} Rd, Rn, Operand2

E xamples: AN D r0, r1, r2 E ORS r1,r3,r0


28/80

28 28

ARM

D ata Movement

Operations are: MOV operand2 MVN N OT operand2

N ote that these make no use of operand1.Syntax: {}{S} Rd, Operand2

E xamples: MOV r0, r1 MOVS r2, #10 MVNE Q r1,#0


29/80

29 29

ARM

Quiz #2Start

Stopr0 = r1?

r0 > r1?

r0 = r0 - r1 r1 = r1 - r0

Yes

No Yes

No

Convert the GCDalgorithm given in thisflowchart into

1) N ormal assembler,where only branchescan be conditional.

2)ARM assembler,

where all instructionsare conditional, thusimproving codedensity.

The only instructionsyou need are CMP, B


30/80

30 30

ARM

Quiz #2 - Sample Solutions

ARM Conditional Assembler

g c d cmp r0, r1 ;i f r0 > r1sub gt r0, r0, r1 ; sub t rac t r1

fr o m r0subl t r1, r1, r0 ;e ls e sub t rac t

r0 fr o m r1b ne g c d ; r e ac hed the

end?


31/80

31 31

ARM

The Barrel Shifter

The ARM doesnt have actual shift instructions.

Instead it has a barrel shifter which provides amechanism to carry out shifts as part of other instructions.

So what operations does the barrel shifter support?


32/80

32 32

ARM

Shifts left by the specified amount (multiplies

by powers of two) e.g.LSL #5 = multiply by 32

Barrel Shifter - Left Shift

L ogical Shift L eft ( L SL )

DestinationCF 0


33/80

33 33

ARM

Logical Shift RightShifts right bythe specifiedamount (dividesby powers of two)e.g.

LSR #5 = divideby 32

Arithmetic ShiftRightShifts right

(divides byowers of two

Barrel Shifter - Right Shifts

Destination CF

Destination CF

L ogical Shift R ight

A rithmetic Shift R ight

...0

Sign b it shifted in


34/80

34 34

ARM

Barrel Shifter - RotationsRotate Right (ROR)Similar to an ASR butthe bits wrap aroundas they leave the LSB

and appear as theMSB.

e.g. ROR #5N ote the last bitrotated is also usedas the Carry Out.

Rotate Right E xtended(RR X )

This operation uses

Destination CF

R otate R ight

Destination CF

R otate R ight through Carry


35/80

35 35

ARM

Using the Barrel Shifter:The Second Operand

* Immediate value8 bit num ber C an be r o tated right thr o ugh an even num ber o f

po sitio ns.Ass em bler w ill calculate r o tate f o r yo u fr om co nstant .

Register, optionallywith shift operationapplied.Shift value can be

either be: 5 bit unsigned integer Specified in bottom

byte of another register.

Operand1

Result

ALU

BarrelShifter

Operand2


36/80

36 36

ARM

Second Operand :Shifted Register

The amount by which the register is to be shiftedis contained in either: the immediate 5-bit field in the instruction

NO OVE RH E ADNO OVE RH E ADShift is done for free - executes in single cycle.

the bottom byte of a register (not PC)Then takes extra cycle to execute

ARM doesnt have enough read ports to read 3 registersat once.Then same as on other processors where shift isseparate instruction.

If no shift is specified then a default shift isapplied: LSL #0 i.e. barrel shifter has no effect on value in register.


37/80

37 37

ARM

Second Operand :Using a Shifted Register

Using a multiplication instruction to multiply by aconstant means first loading the constant into aregister and then waiting a number of internal cyclesfor the instruction to complete.

A more optimum solution can often be found by usingsome combination of MOVs, ADDs, SUBs and RSBswith shifts. Multiplications by a constant equal to a ((power of 2) 1) can

be done in one cycle.E xample: r0 = r1 * 5E xample: r0 = r1 + (r1 * 4)

ADD r0, r1, r1, LSL #2E xam le: r2 = r3 * 105


38/80

38 38

ARM

Second Operand :Immediate Value (1)

There is no single instruction which will load a 32 bitimmediate constant into a register without performing adata load from memory. All ARM instructions are 32 bits long ARM instructions do not use the instruction stream as data.

The data processing instruction format has 12 bitsavailable for operand2 If used directly this would only give a range of 4096.

Instead it is used to store 8 bit constants, giving arange of 0 - 255.These 8 bits can then be rotated right through an even

number of positions (ie RORs by 0, 2, 4,..30). This ives a much lar er ran e of constants that can be


39/80

39 39

ARM

Second Operand :Immediate Value (2)

This gives us: 0 - 255 [0 - 0xff] 256,260,264,..,1020 [0x100-0x3fc, step 4, 0x40-

0xff ror 30] 1024,1040,1056,..,4080 [0x400-0xff0, step 16, 0x40-0xff ror

28] 4096,4160, 4224,..,16320 [0x1000-0x3fc0, step 64, 0x40-0xff

ror 26]These can be loaded using, for example:

MOV r0, #0x40, 26 ; => MOV r0, #0x1000 (ie4096)

To make this easier, the assembler will convert to this

form for us if simply given the required constant:=


40/80

40 40

ARM

Loading full 32 bit constants

Although the MOV/MV N mechansim will load a largerange of constants into a register, sometimes thismechansim will not generate the required constant.Therefore, the assembler also provides a methodwhich will load ANY ANY 32 bit constant:

LDR r d ,= n ume r i c c on s t a nt

If the constant can be constructed using either a MOVor MV N then this will be the instruction actuallygenerated.Otherwise, the assembler will produce an LDRinstruction with a PC-relative address to read the

constant from a literal pool. LDR r0,=0x4 2 ; gene ra te s MOV r0,#0x4 2


41/80

41 41

ARM

Multiplication Instructions

The Basic ARM provides two multiplicationinstructions.Multiply MUL{}{S} Rd, Rm, Rs ; Rd = Rm * Rs

Multiply Accumulate - does addition for free MLA{}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn

Restrictions on use: Rd and Rm cannot be the same register

Can be avoid by swapping Rm and Rs around. This worksbecause multiplication is commutative.

Cannot use PC.

These will be picked up by the assembler if overlooked.


42/80

42 42

ARM

Multiplication Implementation

The ARM makes use of Booths Algorithm to performinteger multiplication.On non-M ARMs this operates on 2 bits of Rs at a time.

For each pair of bits this takes 1 cycle (plus 1 cycle to startwith).

However when there are no more 1s left in Rs, themultiplication will early-terminate.

E xample: Multiply 18 and -1 : Rd = Rm * Rs

0 0 0 0 0 0 1 00 0 0 10 0 0 00 0 0 00 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1

R m

R s

17 cycles

R s

R m4 cycles

18

-1

18

-1


43/80

43 43

ARM

Extended Multiply Instructions

M variants of ARM cores contain extendedmultiplication hardware. This provides threeenhancements:

An 8 bit Booths Algorithm8 bit Booths Algorithm is usedMultiplication is carried out faster (maximum for standardinstructions is now 5 cycles).

E arly termination method improved E arly termination method improved so that now completes

multiplication when all remaining bit sets containall zeroes (as with non-M ARMs), or all ones.

Thus the previous example would early terminate in 2cycles in both cases.

64 bit results64 bit results can now be produced from two 32bitoperands


44/80

44 44

ARM

Multiply-Long andMultiply-Accumulate Long

Instructions are MULL which gives RdHi,RdLo:=Rm*Rs MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

However the full 64 bit of the result now matter (lower precision multiply instructions simply throws top 32bitsaway)

N eed to specify whether operands are signed or unsigned

Therefore syntax of new instructions are: UMULL{}{S} RdLo,RdHi,Rm,Rs UMLAL{}{S} RdLo,RdHi,Rm,Rs SMULL{}{S} RdLo, RdHi, Rm, Rs SMLAL{}{S} RdLo, RdHi, Rm, Rs


45/80

45 45

ARM

Quiz #31. Specify instructions which will implement the following:a) r0 = 16 b) r1 = r0 * 4

c) r0 = r1 / 16 ( r1 signed 2's comp.) d) r1 = r2 * 7

2. W hat will the following instructions do?a) ADDS r0, r1, r1, LSL #2 b) RSB r2, r1, #0

3. W hat does the following instruction sequence do? ADD r0, r1, r1, LSL #1SUB r0, r0, r1, LSL #4

ADD r0, r0, r1, LSL #7


46/80

46 46

ARM

Load / Store Instructions

The ARM is a Load / Store Architecture: Does not support memory to memory data processing

operations. Must move data values into registers before using them.

This might sound inefficient, but in practice isnt: Load data values from memory into registers. Process data in registers using a number of data

processing instructions which are not slowed down bymemory access.

Store results from registers out to memory.

The ARM has three sets of instructions whichinteract with main memory. These are: Single register data transfer (LDR / STR).


47/80

47 47

ARM

Single register data transfer

The basic load and store instructions are: Load and Store W ord or Byte

LDR / STR / LDRB / STRB

ARM Architecture Version 4 also adds support for halfwords and signed data.

Load and Store HalfwordLDRH / STRH

Load Signed Byte or Halfword - load value and sign extend it to32 bits.

LDRSB / LDRSH

All of these instructions can be conditionally executedby inserting the appropriate condition code after STR /LDR.


48/80

48 48

ARM

Load and Store Word or Byte:Base Register

The memory location to be accessed is held in

a base register STR r0, [r1] ; Store contents of r0 to location

pointed to; by contents of r1.

LDR r2, [r1] ; Load r2 with contents of memorylocation

; pointed to by contents of r1.r1

0x200Base

Register

Memory

0x50x200

r0

0x5Source

Register for STR

r2

0x5D estination

Register for L D R


49/80

49 49

ARM

Load and Store Word or Byte:Offsets from the Base Register

As well as accessing the actual location contained inthe base register, these instructions can access alocation offset from the base register pointer.This offset can be

An unsigned 12bit immediate value (ie 0 - 4095 bytes). A register, optionally shifted by an immediate value

This can be either added or subtracted from thebase register: Prefix the offset value or register with + (default) or -.

This offset can be applied: before the transfer is made: PrePre- -indexed a ddressing indexed a ddressing

optionally autoauto--incrementing incrementing the base register, by postfixingthe instruction with an !.


50/80

50 50

ARM

Load and Store Word or Byte:P re-indexed Addressing

E xample: STR r0, [r1,#12]

To store to location 0x1f4 instead use: STR r0, [r1,#-

12]

r1

0x200Base

Register

Memory

0x5

0x200

r0

0x5Source

Register for STR

Offset

12 0x20c


51/80

51 51

ARM

Load and Store Word or Byte:P ost-indexed Addressing

E xample: STR r0, [r1], #12

To auto-increment the base register to location 0x1f4instead use:

STR r0, [r1], #-12

r1

0x200Original

BaseRegister

Memory

0x50x200

r0

0x5Source

Register for STR

Offset

12 0x20c

r1

0x20c Updated

BaseRegister


52/80

52 52

ARM

Load and Storeswith User Mode P rivilege

W hen using post-indexed addressing, there is a further form of Load/Store W ord/Byte: {}{B} T Rd,

W hen used in a privileged mode, this does the

load/store with user mode privilege. N ormally used by an exception handler that is emulating amemory access instruction that would normally execute in user mode.


53/80

53 53

ARM

Example Usage of Addressing Modes

Imagine an array, the first element of which is pointedto by the contents of r0.If we want to access a particular element,then we can use pre-indexed addressing:

r1 is element we want. LDR r2, [r0, r1, LSL #2]

If we want to step through everyelement of the array, for instanceto produce sum of elements in thearray, then we can use post-indexed addressing withina loop:

r1 is address of current element (initially equal to r0).

0

1

2

3

element

0

4

8

12

MemoryOffset

r0

Pointer tostart of array


54/80

54 54

ARM

Offsets for H alfword and SignedH alfword / Byte Access

The Load and Store Halfword and Load Signed Byte or Halfword instructions can make use of pre- and post-indexed addressing in much the same way as thebasic load and store instructions.However the actual offset formats are moreconstrained: The immediate value is limited to 8 bits (rather than 12 bits)

giving an offset of 0-255 bytes. The register form cannot have a shift applied to it.

ARM


55/80

55 55

ARM

Effect of endianess

The ARM can be set up to access its data in either little or big endian format.Little endian: Least significant byte of a word is stored in b its 0 b its 0 - -7 7 of an

addressed word.

Big endian: Least significant byte of a word is stored in b its 24b its 24 - -31 31 of

an addressed word.This has no real relevance unless data is stored aswords and then accessed in smaller sizedquantities (halfwords or bytes). W hich byte / halfword is accessed will depend on the

endianess of the system involved.

ARM


56/80

56 56

ARM

Endianess Example

Big-endianL ittle-endian

r1 = 0x100

r 0 = 0x11 22334431 24 23 16 15 8 7 0

11 22 33 44

31 24 23 16 15 8 7 0

11 22 33 44

31 24 23 16 15 8 7 0

44 33 22 11

31 24 23 16 15 8 7 0

00 00 00 44

31 24 23 16 15 8 7 0

00 00 00 11

r2 = 0x 44 r2 = 0x11

STR r0, [r1]

LD RB r2, [r1]

r1 = 0x100Memo ry

ARM


57/80

57 57

ARM

Quiz #4

W rite a segment of code that add together elements xto x+(n-1) of an array, where the element x=0 is thefirst element of the array.E ach element of the array is word sized (ie. 32 bits).The segment should use post-indexed addressing.

At the start of your segments, you should assume that: r0 points to the start of the array. r1 = x r2 = n r0

xx + 1

x + (n - 1)

Elements

{n elements

0

ARM


58/80

58 58

ARM

Quiz #4 - Sample SolutionADD r0, r0, r1, LSL# 2 ; Set r0 to a dd r e ss o fe l e ment xADD r 2 , r0, r 2 , LSL# 2 ; Set r 2 to a dd r e ss o fe l e ment n +1MOV r1, #0 ; I niti al i s e c o u nte r

l oo pLDR r3, [r0], #4 ; Acc e ss e l e ment a nd move to ne x t

ADD r1, r1, r3 ; Add c ontent s to c o u nte rCMP r0, r 2 ; Ha ve we r e ac hed e l e ment x+ n?

BLT l oo p ; If not - r e p e a t f o r

; ne x t e l e ment

ARM


59/80

59 59

ARM

BlockD

ata Transfer (1)

Cond 1 0 0 P U S W L R n R egister list

Condition field Base registerL oad/Store b it0 = S to re to m emo ry1 = L o ad fr om m emo ry

Write- b ack b it0 = no w rite - back 1 = w rite addre ss into base

PS R and force user b it0 = do nt lo ad PSR o r f o r ce user mo de1 = l o ad PSR o r f o r ce user mo de

U p/Down b it0 = D ow n; subtra ct o ff set fr om base1 = Up ; add o ff set to base

Pre/Post indexing b it0 = P o st; add o ff set after tran sfer ,1 = P re ; add o ff set bef o re tran sfer

2831 22 16 023 21 1527 20 1924

Each b it corresponds to a particular

register. For example:Bit 0 set causes r 0 to be tran sferred .Bit 0 unset causes r 0 no t to be tran sferred .

A t least one register must b etransferred as the list cannot b e empty.

The Load and Store Multiple instructions (LDM

/ STM) allow betweeen 1 and 16 registers to betransferred to or from memory.The transferred registers can be either: Any subset of the current bank of registers (default).

Any subset of the user mode bank of registers whenin a priviledged mode (postfix instruction with a ^ ).

ARM


60/80

60 60

ARM

BlockD

ata Transfer (2)

Base register used to determine where memory accessshould occur. 4 different addressing modes allow increment and decrement

inclusive or exclusive of the base register location. Base register can be optionally updated following the transfer

(by appending it with an !!. Lowest register number is always transferred to/from lowest

memory location accessed.

These instructions are very efficient for Saving and restoring context

For this useful to view memory as a stack. Moving large blocks of data around memory

For this useful to directly represent functionality of theinstructions.

ARM


61/80

61 61

ARM

Stacks

A stack is an area of memory which grows asnew data is pushed onto the top of it, andshrinks as data is popped off the top.Two pointers define the current limits of thestack. A base pointer

used to point to the bottom of the stack (the first location).

A stack pointer used to point the current top of the stack.

SP

BA SE

PUSH {1,2,3}

1

23

BA SE

SP

POP

1

2Result of pop = 3

BA SE

SP

ARM


62/80

62 62

ARM

Stack Operation

Traditionally, a stack grows down in memory, with thelast pushed value at the lowest address. The ARMalso supports ascending stacks, where the stackstructure grows up through memory.The value of the stack pointer can either:

Point to the last occupied address (Full stack)and so needs pre-decrementing (ie before the push)

Point to the next occupied address ( E mpty stack)and so needs post-decrementing (ie after the push)

The stack type to be used is given by the postfix to theinstruction:

STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack.

ARM


63/80

63 63

ARM

Stack ExamplesSTMFD sp!,{r0,r1,r3-r5}

r5r4r3

r1r0SP

Old SP

STMED sp!,{r0,r1,r3-r5}

r5r4r3r1

r0SP

Old SP

r5r4r3

r1r0

STMF A sp!,{r0,r1,r3-r5}

SP

Old SP 0x400

0x418

0x3e8

STME A sp!,{r0,r1,r3-r5}

r5r4

r3r1r0

SP

Old SP


64/80

ARM


65/80

65 65

ARM

D irect functionality of Block D ata Transfer

W hen LDM / STM are not being used to implementstacks, it is clearer to specify exactly what functionalityof the instruction is: i.e. specify whether to increment / decrement the base pointer,

before or after the memory access.

In order to do this, LDM / STM support a further syntaxin addition to the stack one: STMIA / LDMIA : Increment After STMIB / LDMIB : Increment Before STMDA / LDMDA : Decrement After STMDB / LDMDB : Decrement Before

ARM


66/80

66 66

ARM

Example: Block Copy Copy a block of memory, which is an exact multiple

of 12 words long from the location pointed to by r12to the location pointed to by r13. r14 points to the endof block to be copied.

; r12 points to the start of the sourcedata

; r14 points to the end of the source data

; r13 points to the start of thedestination dataloop L DMI Ar12!, {r0-r11} ; load 48 bytes

STMI Ar13!, {r0-r11} ; and store them C MP r12, r14 ; check for the end

r13

r14

r12

IncreasingM emory

ARM


67/80

67 67

ARM

Quiz #5

The contents of registers r0 to r6 need to be swappedaround thus: r0 moved into r3 r1 moved into r4 r2 moved into r6 r3 moved into r5 r4 moved into r0 r5 moved into r1 r6 moved into r2

W rite a segment of code that uses full descendingstack operations to carry this out, and hence requiresno use of any other registers for temporary storage.

ARM


68/80

68 68

ARM

Quiz #5 - Sample SolutionSTMFD sp!,

{r0-r6}LDMFD sp!,{r3,r4,r6}

r3 = r0r4 = r1r6 = r2

LDMFD sp!,{r5}

r5 = r3

LDMFD sp!,{r0-r2}

r0 = r4r1 = r5r2 = r6

Old SP

r5r4r3r2r1

SP

r6

r0

r5r4

SP

r6

r3

r5SP

r6

r4

SP

ARM


69/80

69 69

ARM

Atomic operation of a memory read followed by amemory write which moves byte or wordquantities between registers and memory.Syntax: S W P{}{B} Rd, Rm, [Rn]

Swap and Swap Byte Instructions

Rm Rd

Rn

32

1temp

Memory

ARM


70/80

70 70

ARM

Software Interrupt (SWI)

In effect, a S W I is a user-defined instruction.It causes an exception trap to the S W I hardware vector (thus causing a change to supervisor mode, plus theassociated state saving), thus causing the S W Iexception handler to be called.The handler can then examine the comment field of theinstruction to decide what operation has beenrequested.B makin use of the S W I mechansim, an o eratin

2831 2427 0

Cond 1 1 1 1 Comment field (ignored b y Processor)

Condition Field

23

ARM


71/80

71 71

ARM

PSR Transfer Instructions

MRS and MSR allow contents of CPSR/SPSR to betransferred from appropriate status register to a general

purpose register. All of status register, or just the flags, can be transferred.

Syntax: MRS{} R d , ; Rd = MSR{} ,Rm ; = Rm MSR{} ,Rm ; = Rm

where = CPSR, CPSR_all, SPSR o r SPSR_all = CPSR_fl g o r SPSR_fl g

Also an immediate form < > < >

ARM


72/80

72 72

ARM

Using MRS and MSR

Currently reserved bits, may be used in future,therefore:

they must be preserved when altering PSR the value they return must not be relied upon when testing

other bits.

Thus read-modify-write strategy must be followed whenmodifying any PSR:

Transfer PSR to register using MRS

Modify relevant bits

M odeN Z C V

2831 8 4 0

I F T

ARM


73/80

73 73

ARM

Coprocessors

The ARM architecture supports 16 coprocessorsE ach coprocessor instruction set occupies part of the

ARM instruction set.There are three types of coprocessor instruction Coprocessor data processing Coprocessor (to/from ARM) register transfers Coprocessor memory transfers (load and store to/from

memory) Assembler macros can be used to transform customcoprocessor mneumonics into the generic mneumonicsunderstood by the processor.

A coprocessor may be implemented

ARM


74/80

74 74

ARM

Coprocessor D

ataP

rocessing

This instruction initiates a coprocessor operationThe operation is performed only on internalcoprocessor state

For example, a Floating point multiply, which multiplies thecontents of two registers and stores the result in a third register

Syntax: CDP{}

,< o pc_1>, CRd , CRn , CRm,{< o pc_ 2 >}

31 28 27 26 2 5 2 4 2 3 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0

Destination Register Source Registers

Opcode

Condition Code Specifier Opcode

Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm

ARM


75/80

75 75

Coprocessor Register Transfers

These two instructions move data between ARMregisters and coprocessor registers MRC : Move to Register from Coprocessor MCR : Move to Coprocessor from Register

An operation may also be performed on the data as itis transferred For example a Floating Point Convert to Integer instruction can

be implemented as a register transfer to ARM that alsoconverts the data from floating point format to integer format.

Syntax {}

,< o pc_1>,R d , CRn , CRm,< o pc_ 2 >

31 28 27 26 2 5 2 4 2 3 22 2 1 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0

ARM Source/Dest Register Coprocesor Source/Dest Registers

Opcode

Condition Code Specifier Opcode

Transfer To/From Coprocessor

Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm


76/80

ARM


77/80

77 77

Coprocessor Memory Transfers (2)

Syntax of these is similar to word transfers between ARM and memory:

{}{} , CRd ,PC relative offset generated if possible, else causes an error.

{}{}

, CRd ,Pre-indexed form, with optional writeback of the base register

{}{} , CRd ,Post-indexed form

where when present causes a long transfer to be performed

(N =1) else causes a short transfer to be performed ( N =0).

ARM


78/80

78 78

Quiz #6

W rite a short code segment that performs a modechange by modifying the contents of the CPSR The mode you should change to is user mode which has the

value 0x10. This assumes that the current mode is a priveleged mode such

as supervisor mode. This would happen for instance when the processor is reset -

reset code would be run in supervisor mode which would thenneed to switch to user mode before calling the main routine inyour application.

You will need to use MSR and MRS, plus 2 logical operations.M odeN Z C V2831 8 4 0

I F T

ARM


79/80

79 79

Quiz #6 - Sample Solution

Set up useful constants:

mmask EQU 0x1f ; mask to clear mode bits

userm EQU 0x10 ; user mode value

Start off here in supervisor mode.MRS r0, cpsr ; t ak e a c o py o f the CPSRBI C r0,r0,#mmask ; cl e ar the mode b it sORR r0,r0,#us e rm ; s e l e c t ne w m odeMSR cpsr, r0 ; wr ite back the modi f ied

; CPSR

ARM


80/80

80 80

Main features of theARM Instruction Set

All instructions are 32 bits long.Most instructions execute in a single cycle.E very instruction can be conditionally executed.

A load/store architecture Data processing instructions act only on registers

Three operand formatCombined ALU and shifter for high speed bit manipulation

Specific memory access instructions with powerful auto-indexing addressing modes.

32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.

Flexible multiple register load and store instructions

New Armlect1

Documents

Transcript of New Armlect1