Instruction sets

32
© 2005 ECNU SEI Principles of Embedded Computing System Design 1 Instruction sets Computer architecture taxonomy. Assembly language.

description

Instruction sets. Computer architecture taxonomy. Assembly language. von Neumann architecture (P.37). Memory holds data, instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. - PowerPoint PPT Presentation

Transcript of Instruction sets

Page 1: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 1

Instruction sets

Computer architecture taxonomy.Assembly language.

Page 2: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 2

von Neumann architecture (P.37)

Memory holds data, instructions.Central processing unit (CPU) fetches

instructions from memory. Separate CPU and memory

distinguishes programmable computer.CPU registers help out: program

counter (PC), instruction register (IR), general-purpose registers, etc.

Page 3: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 3

CPU + memory

memoryCPU

PC

address

data

IRADD r5,r1,r3200

200

ADD r5,r1,r3

PC

Page 4: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 4

Harvard architecture (P.38)

CPU

PC

data memory

program memory

address

data

address

instruction

Page 5: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 5

von Neumann vs. Harvard

Harvard can’t use self-modifying code.

Harvard allows two simultaneous memory fetches.

Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth.

Page 6: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 6

RISC vs. CISC (P.38)

Complex instruction set computer (CISC): many addressing modes; many operations.

Reduced instruction set computer (RISC): load/store; pipelinable instructions.

Page 7: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 7

Instruction set characteristics

Fixed vs. variable length.Addressing modes.Number of operands.Types of operands.

Page 8: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 8

Programming model

Programming model: registers visible to the programmer.

Some registers are not visible (IR).

Page 9: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 9

Multiple implementations

Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc.

Page 10: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 10

Assembly language (P.39)

One-to-one with instructions (more or less).

Basic features: One instruction per line. Labels provide names for addresses

(usually in first column). Instructions often start in later columns. Comments run to end of line.

Page 11: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 11

ARM assembly language example

label1 ADR r4,c

LDR r0,[r4] ; a comment

ADR r4,d

LDR r1,[r4]

SUB r0,r0,r1 ; another comment

B label1

Page 12: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 12

Pseudo-ops (P.40)

Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants.

Page 13: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 13

Example

BIGBLOCK % 10

ARM

.global BIGBLOCK

.var BIGBLOCK[10]=0,0,0,0,0,0,0,0,0,0;

SHARC

Page 14: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 14

Instruction Set Architecture

ISA provides the level of abstraction for both the software and the hardware.

Page 15: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 15

Crafting an ISA

Designing an ISA is both an art and a science ISA design involves dealing in an extremely rare

resource instruction bits!

Some things we want out of our ISA completeness orthogonality regularity and simplicity compactness ease of programming ease of implementation

Page 16: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 16

Key ISA Decisions

Operations how many? what kinds?

Operands how many? location types how to specify?

Instruction format how does the computer know what 0001 0100 1101 1111

means? size how many formats?

Page 17: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 17

Operand Location

Can classify machines into 3 types: Accumulator Stack Registers

Two types of register machines register-memory

most operands can be registers or memory load-store

most operations (e.g., arithmetic) are only between registers

explicit load and store instructions to move data between registers and memory

Page 18: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 18

How Many Operands?

Accumulator:1 address add A acc <- acc + mem[A]

Stack:0 address add tos <- tos + next

Register-Memory:2 address add Ra B Ra <- Ra + mem[B]3 address add Ra Rb C Ra <- Rb + mem[C]

Load/Store:3 address add Ra Rb Rc Ra <- Rb + Rc

load Ra Rb Ra <- mem[Rb]store Ra Rb mem[Rb] <- Ra

Page 19: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 19

Accumulator Architectures

One explicit operand per instruction A <- A op M A <- A op *M *M <- A

Page 20: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 20

Stack Architectures

No explicit operands in ALU instructions; one in push/pop

A = B + C * D push b push c push d mul add pop a

Page 21: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 21

Register-Set based Architectures

No memory addresses (load/store architecture), typically 3-operand ALU ops

C = A + B LOAD R1 <- A LOAD R2 <- B ADD R3 <- R1 + R2 STORE C <- R3

Page 22: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 22

Addressing Modes

Register direct Add R4, R3 Immediate Add R4, #3 Displacement Add R4, 100 (R1) Indirect Add R4, (R1) Indexed Add R3, (R1 + R2) Direct Add R1, (1001) Memory indirect Add R1, @(R3) Autoincrement Add R1, (R2)+ Autodecrement Add R1, -(R2) Scaled Add R1, 100(R2)[R3]

Page 23: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 23

Addressing Mode Utilization

Page 24: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 24

Encoding of Instruction Set

Page 25: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 25

Our Desired ISA

Load-Store register arch Addressing modes

immediate (8-16 bits) (256-65536) displacement (12-16 bits) (4k-64k) register deferred (register indirect)

Support a reasonable number of operations Don’t use condition codes Fixed instruction encoding/length for

performance Regularity (several general-purpose

registers)

Page 26: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 26

MIPS Instruction Format

Page 27: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 27

ARM IS

Page 28: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 28

Compiler/ISA Interaction

Compiler is primary customer of ISA Features the compiler doesn’t use are wasted Register allocation is a huge contributor to

performance Compiler-writer’s job is made easier when ISA

has regularity primitives, not solutions simple trade-offs

Compiler wants simplicity over power

Page 29: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 29

Program Usage of Addressing Modes

Page 30: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 30

A simple loop

int A[100], B[100], C;main (){

int i;c=10;for (i=0; i<100; i++)

A[i] = B[i] + C;}

Page 31: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 31

Unoptimized code

C=10; li r14, 10 sw r14, C

for (i=0; i<100; i++) sw r0, 4(sp)

$33: A[i] = B[i] + C

lw r14, 4(sp) mul r15, r14, 4 lw r24, B(r15) lw r25, C addu r8, r24, r25 lw r16, 4(sp) mul r17, r16, 4 sw r8, A(r17) lw r9, 4(sp) addu r10, r9, 1 sw r10, 4(sp) blt r10, 100, $33

j $31

12 instructions per iteration

Page 32: Instruction sets

© 2005 ECNU SEI Principles of Embedded Computing System Design 32

Optimized code

C=10; li r14, 10 sw r14, C

for (i=0; i<100; i++) la r3, A la r4, B la r6, B+400

$33: A[i] = B[i] + C

lw r14, 0(r4) addu r15, r14, 10 sw r15, 0(r3) addu r3, r3, 4 addu r4, r4, 4 bltu r4, r6, $33

j $31

6 instructions per iteration4 fewer loads due to code motion,register allocation, constant propagation2 fewer multipies due to induction variable elimination, strength reduction

Can you do better by hand?