lec-3 ISA

Instruction Set ArchitectureInstruction Set ArchitectureInstruction Set ArchitectureInstruction Set Architecture

Ajit PalAjit Pal

ProfessorProfessorDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringIndian Institute of Technology KharagpurIndian Institute of Technology KharagpurINDIA-721302INDIA-721302

High Performance Computer High Performance Computer ArchitectureArchitecture

Outline

An Instruction

Instruction Set Architecture

Classification of ISAs

Classification of Operations

Operands

Instruction Format

Addressing Modes

Evolution of the Instruction Set

RISC Versus CISC

MIPS Instruction Set

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

An An InstructionInstructionAn instruction can be considered as a word in processor’s languageWhat information an Instruction should covey to the CPU?

opcodeopcode Addr of OP1Addr of OP1 Addr of OP2Addr of OP2 Dest AddrDest Addr Addr of next intrAddr of next intr

The instruction is too long if everything specified explicitly

More space in memoryLonger execution time

How can you reduced the size of an instruction?Specifying information implicitlyHow?

Using PC, ACC, GPRs, SP


Instruction Set Architecture

“Instruction Set Architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine.”

The ISA defines: Operations that the processor can executeData Transfer mechanisms + how to access dataControl Mechanisms (branch, jump, etc)“Contract” between programmer/compiler and HW

ISA is important:ISA is important:Not only from the programmer’s perspective.Not only from the programmer’s perspective.From processor design and implementation From processor design and implementation perspectives as well.perspectives as well.


InstructionInstruction Set Architecture Set Architecture

Programmer visible part of a processor:Programmer visible part of a processor:

■ RegistersRegisters (where are data located?)(where are data located?)

■ Addressing ModesAddressing Modes (how is data accessed?) (how is data accessed?)

■ Instruction FormatInstruction Format (how are instructions (how are instructions

specified?)specified?)

■ Exceptional ConditionsExceptional Conditions (what happens if (what happens if

something goes wrong?) something goes wrong?)

■ Instruction SetInstruction Set (what operations can be (what operations can be

performed?)performed?)


ISA Design Choices Types of operations supported– e.g. arithmetic/logical, data transfer, control transfer, system, floating-point, decimal (BCD), string

Types of operands supported– e.g. byte, character, digit, halfword, word (32-bits), doubleword, floating-point number

Types of operand storage allowed– e.g. stack, accumulator, registers, memory

Implicit vs. explicit operands in instructions and number of each Orthogonality of operands, operand location, andaddressing modes


Classification of ISAsClassification of ISAs

Determined by the means used for storing data in CPU:Determined by the means used for storing data in CPU:

The major choices are:The major choices are:■ A stackA stack, , an accumulator,an accumulator, or a set of registers or a set of registers

Stack architecture:Stack architecture:■ Operands are implicitly on top of the stackOperands are implicitly on top of the stack

Accumulator architecture:Accumulator architecture:■ One operand is in the accumulator (register) and the others are One operand is in the accumulator (register) and the others are

elsewhereelsewhere■ Essentially this is a 1 register machineEssentially this is a 1 register machine■ Found in older machines…Found in older machines…

General purpose registers:General purpose registers:■ Operands are in registers or specific memory locations.Operands are in registers or specific memory locations.


Evolution of ISAsEvolution of ISAs

Single Accumulator

(Manchester Mark I, IBM 700 series 1953)

General Purpose Register Machines

Complex Instruction Sets RISC

(Vax, Intel 386 1977-85)(MIPS, IBM RS6000, . . .1987)

Stack(Burroughs, HP-3000 1960-70)

Accumulator Architectures (EDSAC, IBM-701)Accumulator Architectures (EDSAC, IBM-701) Special-purpose Register ArchitecturesSpecial-purpose Register Architectures General purpose registers architecturesGeneral purpose registers architectures

Register-memory (IBM 360, DEC PDP-11, Intel 80386Register-memory (IBM 360, DEC PDP-11, Intel 80386 Register-register (load-store) (CDC6600, MIPS, DEC alpha)Register-register (load-store) (CDC6600, MIPS, DEC alpha)


Classification of ISAs

Types of

Architecture

Source

Operands

Destination

Stack Top two

elements in

stack

Top of stack

Accumulator Accumulator (1)

Memory (other)

Accumulator

Register set Register or

Memory

Register or

Memory


Comparison of ArchitecturesComparison of Architectures

Consider the operation: C = A + B

Stack Accumulator Register-Memory Register-RegisterPush A Load A Load R1, A Load R1, APush B Add B Add R1, B Load R2, BAdd Store C Store C, R1 Add R3, R1, R2Pop C Store C, R3


Classification of OperationsClassification of Operations

Data Transfer Data Transfer – – Load, store, movLoad, store, mov

Data ManipulationData Manipulation Arithmetic Arithmetic – Add, subtract, multiply, divide– Add, subtract, multiply, divide Signed, unsigned, integer, floating-pointSigned, unsigned, integer, floating-point Logical - Logical - Conjunction, disjunction, shift left, shift Conjunction, disjunction, shift left, shift

rightright

Status manipulationStatus manipulation

Control TransferControl Transfer Conditional BranchConditional Branch Branch on equal, not equalBranch on equal, not equal Set on less thanSet on less than Unconditional JumpUnconditional Jump


Instruction FormatInstruction Format

Instruction length needs to be in multiples of Instruction length needs to be in multiples of

bytes.bytes.

Instruction encoding can be:Instruction encoding can be:■ Variable or FixedVariable or Fixed

Variable encoding tries to use as few bits to Variable encoding tries to use as few bits to

represent a program as possible:represent a program as possible:■ But at the cost of complexity of decodingBut at the cost of complexity of decoding

Alpha, ARM, MIPS instructions:Alpha, ARM, MIPS instructions:

Operation and Modes

Addr1 Addr2 Addr3


Addressing Modes

Addressing modes describe how an instruction can find the location of its operands

• Effective address = actual memory address specified by an addressing mode


Addressing ModesAddressing Modes

INHERENT (0-address) INHERENT (0-address)

IMMEDIATE IMMEDIATE

ABSOLUTE (DIRECT) ABSOLUTE (DIRECT)

O PC O D E

O PC O D E

O PER AN D

O PER AN D

O PCO DE

ADDRESS

O PERAND

M EM O RY


INDIRECT INDIRECT

REGISTER REGISTER

REGISTER REGISTER

INDIRECT INDIRECT


O PC O D E

AD D R ESS

A2

M EM O R Y

O PER AN D A2

O PCO DE R k REG ISTERS

l

k

n

O PC O D E R P k

R EG ISTERPAIR S

(R Pk)

M EM O R Y

O PER AN D


PAGEDPAGED

INDEXEDINDEXED


O PCO DE

O FFSET

O PERAND

ADDRESS

DIRECT-PAG EREG ISTER

M EM O RY

O P C O D E X

B A S EA D D R E S S

O FFS E T

+

O P E R A N D

M E M O R Y

IN D E XR E G IS TE R S


BASED BASED BASED-INDEXEDBASED-INDEXED

RELATIVERELATIVE O PC O D E

D ISPLACEM EN T

+

PR O G R AM C O U N TER

M EM O R Y

O PER AN D

O PC O D E

M EM O R Y

C O N STAN T

+

BASER EG ISTER

O PER AN D



STACKSTACK

Em pty

Em pty

Em pty

SP A

Em pty

Em pty

SP

PU SH A

A

B

Em pty

SP

PU SH B

Em pty

A

B

Em pty

SP

PO P B

Em pty

(c) (d)



RISC/CISCRISC/CISC Controversy Controversy

RISC: Reduced Instruction Set ComputerRISC: Reduced Instruction Set Computer CISC: Complex Instruction Set ComputerCISC: Complex Instruction Set Computer Genesis of CISC architecture:Genesis of CISC architecture:

■ Implementing commonly used instructions in Implementing commonly used instructions in hardware can lead to significant performance hardware can lead to significant performance benefits.benefits.

■ For example, use of a FP processor can lead to For example, use of a FP processor can lead to performance improvements.performance improvements.

Genesis of RISC architecture:Genesis of RISC architecture:

■ The rarely used instructions can be eliminated to The rarely used instructions can be eliminated to save chip space --- on chip cache and large save chip space --- on chip cache and large number of registers can be provided.number of registers can be provided.


FeaturesFeatures of A CISC Processor of A CISC Processor

Rich instruction set:Rich instruction set:■ some simple, some very complex some simple, some very complex

Complex addressing modes:Complex addressing modes:■ Orthogonal addressingOrthogonal addressing (Every possible (Every possible

addressing mode for every instruction).addressing mode for every instruction).

Many instructions take multiple cycles:Many instructions take multiple cycles: Large variation in CPILarge variation in CPI

Instructions are of variable sizes Instructions are of variable sizes Small number of registersSmall number of registers Microcode control Microcode control No (or inefficient) pipelining No (or inefficient) pipelining


TheThe CISC Philosophy CISC Philosophy

One instruction could do the work of several One instruction could do the work of several instructions. instructions. ■ For example, a single instruction could load For example, a single instruction could load

two numbers to be added, add them, and then two numbers to be added, add them, and then store the result back to memory directly.store the result back to memory directly.

Many versions of the same instructions were Many versions of the same instructions were supported; supported; ■ Different versions did almost the same thing Different versions did almost the same thing

with minor changes. with minor changes. ■ For example, one version would read two For example, one version would read two

numbers from memory, and store the result in numbers from memory, and store the result in a register. Another version would read one a register. Another version would read one number from memory and the other from a number from memory and the other from a register and store the result to memory.register and store the result to memory.


FeaturesFeatures of a RISC Processor of a RISC Processor

Small number of instructionsSmall number of instructions

Small number of addressing modes Small number of addressing modes

Large number of registers (>32)Large number of registers (>32)

Instructions execute in one or two clock cycles Instructions execute in one or two clock cycles

Uniformed length instructions and fixed Uniformed length instructions and fixed instruction format. instruction format.

Register-Register Architecture:Register-Register Architecture: Separate memory instructions (load/store) Separate memory instructions (load/store)

Separate instruction/data cacheSeparate instruction/data cache

Hardwired controlHardwired control

PipeliningPipelining (Why CISC are not pipelined?) (Why CISC are not pipelined?)


Early RISC ProcessorsEarly RISC Processors

1987 Sun SPARC1987 Sun SPARC

1990 IBM RS 60001990 IBM RS 6000

1996 IBM/Motorola PowerPC 1996 IBM/Motorola PowerPC


CISC vs. RISC CISC vs. RISC OrganizationsOrganizations

Main Memory Main Memory

MicroprogrammedControl Unit

MicroprogrammedControl Memory

CacheHardwared Control Unit

InstructionCache

DataCache

(a) CISC Organization (b) RISC Organization


Why Does RISC Lead to Improved Performance?Why Does RISC Lead to Improved Performance?

Increased GPRs (Also, Register Windows)Increased GPRs (Also, Register Windows)

■ leads to decreased data traffic to memory.leads to decreased data traffic to memory.

Register-Register architecture leads to more Register-Register architecture leads to more uniform instructions:uniform instructions:

■ Efficient pipelining becomes possible.Efficient pipelining becomes possible.

However, large instruction memory traffic:However, large instruction memory traffic:

■ Because of larger number of instructions.Because of larger number of instructions.


RISC-Like instructions can be scheduled to achieve RISC-Like instructions can be scheduled to achieve efficiency:efficiency:■ Either by compiler orEither by compiler or■ By hardware (Dynamic instruction scheduling).By hardware (Dynamic instruction scheduling).

Suppose we need to add 2 values and store the Suppose we need to add 2 values and store the results back to memory.results back to memory.

In CISC:In CISC:■ It would be done using a single instruction.It would be done using a single instruction.

In RISC:In RISC:■ 4 instructions would be necessary.4 instructions would be necessary.

Why Does RISC Lead to Improved Performance?Why Does RISC Lead to Improved Performance?


Case Study

MIPS Instruction Set architecture


Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance

Data Transfer– 32 words in registers, millions of words in main memory– Instruction accesses memory via memory address– Address indexes memory, a large single-dimensional array

Alignment– Most architectures address individual bytes– Addresses of sequential words differ by 4– Words must always start at addresses that are multiples of 4

Operands in MIPS


r0: always 0r1: reserved for the assemblerr2, r3: function return valuesr4~r7: function call argumentsr8~r15: “caller-saved” temporariesr16~r23 “callee-saved” temporariesr24~r25 “caller-saved” temporariesr26, r27: reserved for the operating systemr28: global pointerr29: stack pointerr30: callee-saved temporariesr31: return address

Register Usage Conventions Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance


Data Types in MIPS

Data and instructions are both represented in bits– 32-bit architectures employ 32-bit instructions– Combination of fields specifying operations/operands

MIPS operates on:32-bit (unsigned or 2’s complement) integers,32-bit (single precision floating point) real numbers, 64-bit (double precision floating point) real numbers; 32-bit words, bytes and half words can be loaded into GPRs After loading into GPRs, bytes and half words are either zero or sign bit expanded to fill the 32 bits; Only 32-bit units can be loaded into FPRs and 32-bit real numbers are stored in even numbered FPRs. 64-bit real numbers are stored in two consecutive FPRs, starting with even-numbered register.Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit significand double:11-bit exponent, 52-bit significand


MIPS Memory OrganizationMIPS Memory Organization

• MIPS supports byte addressability:

it implies that a byte is the smallest unit with its

address;

32-bit word has to start at byte address that is

multiple of 4;

Thus, 32-bit word at address 4n includes four bytes

with addresses: 4n, 4n+1, 4n+2, and 4n+3.

16-bit half word has to start at byte address that is

multiple of 2; Thus, 16-bit word at address 2n includes

two bytes with addresses: 2n and 2n+1.

it implies that an address is given as 32-bit unsigned


MIPS Instruction Formats

R-Type Instruction Fields– op: basic operation of instruction, called the opcode– rs, rt: first, second register source operand– rd: register destination operand– shamt: shift amount for shift instructions– funct: specifies a variant of the operation, called function code


• I-Type Instruction Fields– Opcode specifies instruction format

• J-Type Instruction Fields

MIPS Instruction Formats


MIPS Addressing ModesMIPS Addressing Modes

• Register Addressing– Operand is a register (e.g., add $s2, $s0, $s1)

• Base or Displacement Addressing– Operand is at memory location whose address is sum of register and constant (e.g., lw $s1, 8($s0))


• Immediate Addressing– Operand is constant within instruction (e.g., addi $s1, $s0, 4)


• PC-Relative Addressing– Address is sum of PC and constant within instruction (e.g., beq $s0, $s1, 16)



• Pseudo-direct Addressing– Address is 26 bits of constant within instruction concatenated with upper 6 bits of PC (e.g., j 1000)


Survey of MIPS Instruction Set• Arithmetic

• Addition– add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected– addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow undetected

• Subtraction– sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected– subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow undetected

• Immediate/Constants– addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected– addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow undetected


• Multiply– mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product– multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned product

• Divide– div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3– divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3, unsigned

• Ancillary– mfhi $s1 # $s1 = Hi, get copy of Hi– mflo $s1 # $s1 = Lo, get copy of low

• Arithmetic

Survey of MIPS Instruction Set


• Boolean Operations– and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND– or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR

• Immediate/Constants– andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND– ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR

• Shifting– sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left– srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right


• Logical


• Load Operations– lw $s1, 100($s2) # $s1 = Mem($s2+100), load word– lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte– lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.

• Store Operations– sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word– sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte


• Data Transfer


Survey of MIPS Instruction Set• Control Transfer

• Jump Operations– j 2500 # go to 10000– jal 2500 # $ra = PC+4, go to 10000, for procedure call– jr $ra # go to $ra, for procedure return

• Branch Operations– beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative– bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative

• Comparison Operations– slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2’s comp.– slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2’s comp.– sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned– sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned


Survey of MIPS Instruction Set• Floating Point Operations

• Arithmetic– {add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision– {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision– Floating-point registers are used in even-odd pairs, using evennumber register as its name

• Data Transfer– {lwc1, swc1} $f1, 100($s2)– Transfer data to/from floating-point register file

• Conditional Branch– {c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0– {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100


Thanks!Thanks!

lec-3 ISA

Documents

Transcript of lec-3 ISA