lec-3 ISA
-
Upload
himanshuagra -
Category
Documents
-
view
119 -
download
8
Transcript of lec-3 ISA
Instruction Set ArchitectureInstruction Set ArchitectureInstruction Set ArchitectureInstruction Set Architecture
Ajit PalAjit Pal
ProfessorProfessorDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringIndian Institute of Technology KharagpurIndian Institute of Technology KharagpurINDIA-721302INDIA-721302
High Performance Computer High Performance Computer ArchitectureArchitecture
Outline
An Instruction
Instruction Set Architecture
Classification of ISAs
Classification of Operations
Operands
Instruction Format
Addressing Modes
Evolution of the Instruction Set
RISC Versus CISC
MIPS Instruction Set
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
An An InstructionInstructionAn instruction can be considered as a word in processor’s languageWhat information an Instruction should covey to the CPU?
opcodeopcode Addr of OP1Addr of OP1 Addr of OP2Addr of OP2 Dest AddrDest Addr Addr of next intrAddr of next intr
The instruction is too long if everything specified explicitly
More space in memoryLonger execution time
How can you reduced the size of an instruction?Specifying information implicitlyHow?
Using PC, ACC, GPRs, SP
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Instruction Set Architecture
“Instruction Set Architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine.”
The ISA defines: Operations that the processor can executeData Transfer mechanisms + how to access dataControl Mechanisms (branch, jump, etc)“Contract” between programmer/compiler and HW
ISA is important:ISA is important:Not only from the programmer’s perspective.Not only from the programmer’s perspective.From processor design and implementation From processor design and implementation perspectives as well.perspectives as well.
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
InstructionInstruction Set Architecture Set Architecture
Programmer visible part of a processor:Programmer visible part of a processor:
■ RegistersRegisters (where are data located?)(where are data located?)
■ Addressing ModesAddressing Modes (how is data accessed?) (how is data accessed?)
■ Instruction FormatInstruction Format (how are instructions (how are instructions
specified?)specified?)
■ Exceptional ConditionsExceptional Conditions (what happens if (what happens if
something goes wrong?) something goes wrong?)
■ Instruction SetInstruction Set (what operations can be (what operations can be
performed?)performed?)
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
ISA Design Choices Types of operations supported– e.g. arithmetic/logical, data transfer, control transfer, system, floating-point, decimal (BCD), string
Types of operands supported– e.g. byte, character, digit, halfword, word (32-bits), doubleword, floating-point number
Types of operand storage allowed– e.g. stack, accumulator, registers, memory
Implicit vs. explicit operands in instructions and number of each Orthogonality of operands, operand location, andaddressing modes
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Classification of ISAsClassification of ISAs
Determined by the means used for storing data in CPU:Determined by the means used for storing data in CPU:
The major choices are:The major choices are:■ A stackA stack, , an accumulator,an accumulator, or a set of registers or a set of registers
Stack architecture:Stack architecture:■ Operands are implicitly on top of the stackOperands are implicitly on top of the stack
Accumulator architecture:Accumulator architecture:■ One operand is in the accumulator (register) and the others are One operand is in the accumulator (register) and the others are
elsewhereelsewhere■ Essentially this is a 1 register machineEssentially this is a 1 register machine■ Found in older machines…Found in older machines…
General purpose registers:General purpose registers:■ Operands are in registers or specific memory locations.Operands are in registers or specific memory locations.
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Evolution of ISAsEvolution of ISAs
Single Accumulator
(Manchester Mark I, IBM 700 series 1953)
General Purpose Register Machines
Complex Instruction Sets RISC
(Vax, Intel 386 1977-85)(MIPS, IBM RS6000, . . .1987)
Stack(Burroughs, HP-3000 1960-70)
Accumulator Architectures (EDSAC, IBM-701)Accumulator Architectures (EDSAC, IBM-701) Special-purpose Register ArchitecturesSpecial-purpose Register Architectures General purpose registers architecturesGeneral purpose registers architectures
Register-memory (IBM 360, DEC PDP-11, Intel 80386Register-memory (IBM 360, DEC PDP-11, Intel 80386 Register-register (load-store) (CDC6600, MIPS, DEC alpha)Register-register (load-store) (CDC6600, MIPS, DEC alpha)
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Classification of ISAs
Types of
Architecture
Source
Operands
Destination
Stack Top two
elements in
stack
Top of stack
Accumulator Accumulator (1)
Memory (other)
Accumulator
Register set Register or
Memory
Register or
Memory
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Comparison of ArchitecturesComparison of Architectures
Consider the operation: C = A + B
Stack Accumulator Register-Memory Register-RegisterPush A Load A Load R1, A Load R1, APush B Add B Add R1, B Load R2, BAdd Store C Store C, R1 Add R3, R1, R2Pop C Store C, R3
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Classification of OperationsClassification of Operations
Data Transfer Data Transfer – – Load, store, movLoad, store, mov
Data ManipulationData Manipulation Arithmetic Arithmetic – Add, subtract, multiply, divide– Add, subtract, multiply, divide Signed, unsigned, integer, floating-pointSigned, unsigned, integer, floating-point Logical - Logical - Conjunction, disjunction, shift left, shift Conjunction, disjunction, shift left, shift
rightright
Status manipulationStatus manipulation
Control TransferControl Transfer Conditional BranchConditional Branch Branch on equal, not equalBranch on equal, not equal Set on less thanSet on less than Unconditional JumpUnconditional Jump
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Instruction FormatInstruction Format
Instruction length needs to be in multiples of Instruction length needs to be in multiples of
bytes.bytes.
Instruction encoding can be:Instruction encoding can be:■ Variable or FixedVariable or Fixed
Variable encoding tries to use as few bits to Variable encoding tries to use as few bits to
represent a program as possible:represent a program as possible:■ But at the cost of complexity of decodingBut at the cost of complexity of decoding
Alpha, ARM, MIPS instructions:Alpha, ARM, MIPS instructions:
Operation and Modes
Addr1 Addr2 Addr3
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Addressing Modes
Addressing modes describe how an instruction can find the location of its operands
• Effective address = actual memory address specified by an addressing mode
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Addressing ModesAddressing Modes
INHERENT (0-address) INHERENT (0-address)
IMMEDIATE IMMEDIATE
ABSOLUTE (DIRECT) ABSOLUTE (DIRECT)
O PC O D E
O PC O D E
O PER AN D
O PER AN D
O PCO DE
ADDRESS
O PERAND
M EM O RY
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
INDIRECT INDIRECT
REGISTER REGISTER
REGISTER REGISTER
INDIRECT INDIRECT
Addressing ModesAddressing Modes
O PC O D E
AD D R ESS
A2
M EM O R Y
O PER AN D A2
O PCO DE R k REG ISTERS
l
k
n
O PC O D E R P k
R EG ISTERPAIR S
(R Pk)
M EM O R Y
O PER AN D
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
PAGEDPAGED
INDEXEDINDEXED
Addressing ModesAddressing Modes
O PCO DE
O FFSET
O PERAND
ADDRESS
DIRECT-PAG EREG ISTER
M EM O RY
O P C O D E X
B A S EA D D R E S S
O FFS E T
+
O P E R A N D
M E M O R Y
IN D E XR E G IS TE R S
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
BASED BASED BASED-INDEXEDBASED-INDEXED
RELATIVERELATIVE O PC O D E
D ISPLACEM EN T
+
PR O G R AM C O U N TER
M EM O R Y
O PER AN D
O PC O D E
M EM O R Y
C O N STAN T
+
BASER EG ISTER
O PER AN D
Addressing ModesAddressing Modes
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
STACKSTACK
Em pty
Em pty
Em pty
SP A
Em pty
Em pty
SP
PU SH A
A
B
Em pty
SP
PU SH B
Em pty
A
B
Em pty
SP
PO P B
Em pty
(c) (d)
Addressing ModesAddressing Modes
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
RISC/CISCRISC/CISC Controversy Controversy
RISC: Reduced Instruction Set ComputerRISC: Reduced Instruction Set Computer CISC: Complex Instruction Set ComputerCISC: Complex Instruction Set Computer Genesis of CISC architecture:Genesis of CISC architecture:
■ Implementing commonly used instructions in Implementing commonly used instructions in hardware can lead to significant performance hardware can lead to significant performance benefits.benefits.
■ For example, use of a FP processor can lead to For example, use of a FP processor can lead to performance improvements.performance improvements.
Genesis of RISC architecture:Genesis of RISC architecture:
■ The rarely used instructions can be eliminated to The rarely used instructions can be eliminated to save chip space --- on chip cache and large save chip space --- on chip cache and large number of registers can be provided.number of registers can be provided.
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
FeaturesFeatures of A CISC Processor of A CISC Processor
Rich instruction set:Rich instruction set:■ some simple, some very complex some simple, some very complex
Complex addressing modes:Complex addressing modes:■ Orthogonal addressingOrthogonal addressing (Every possible (Every possible
addressing mode for every instruction).addressing mode for every instruction).
Many instructions take multiple cycles:Many instructions take multiple cycles: Large variation in CPILarge variation in CPI
Instructions are of variable sizes Instructions are of variable sizes Small number of registersSmall number of registers Microcode control Microcode control No (or inefficient) pipelining No (or inefficient) pipelining
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
TheThe CISC Philosophy CISC Philosophy
One instruction could do the work of several One instruction could do the work of several instructions. instructions. ■ For example, a single instruction could load For example, a single instruction could load
two numbers to be added, add them, and then two numbers to be added, add them, and then store the result back to memory directly.store the result back to memory directly.
Many versions of the same instructions were Many versions of the same instructions were supported; supported; ■ Different versions did almost the same thing Different versions did almost the same thing
with minor changes. with minor changes. ■ For example, one version would read two For example, one version would read two
numbers from memory, and store the result in numbers from memory, and store the result in a register. Another version would read one a register. Another version would read one number from memory and the other from a number from memory and the other from a register and store the result to memory.register and store the result to memory.
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
FeaturesFeatures of a RISC Processor of a RISC Processor
Small number of instructionsSmall number of instructions
Small number of addressing modes Small number of addressing modes
Large number of registers (>32)Large number of registers (>32)
Instructions execute in one or two clock cycles Instructions execute in one or two clock cycles
Uniformed length instructions and fixed Uniformed length instructions and fixed instruction format. instruction format.
Register-Register Architecture:Register-Register Architecture: Separate memory instructions (load/store) Separate memory instructions (load/store)
Separate instruction/data cacheSeparate instruction/data cache
Hardwired controlHardwired control
PipeliningPipelining (Why CISC are not pipelined?) (Why CISC are not pipelined?)
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Early RISC ProcessorsEarly RISC Processors
1987 Sun SPARC1987 Sun SPARC
1990 IBM RS 60001990 IBM RS 6000
1996 IBM/Motorola PowerPC 1996 IBM/Motorola PowerPC
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
CISC vs. RISC CISC vs. RISC OrganizationsOrganizations
Main Memory Main Memory
MicroprogrammedControl Unit
MicroprogrammedControl Memory
CacheHardwared Control Unit
InstructionCache
DataCache
(a) CISC Organization (b) RISC Organization
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Why Does RISC Lead to Improved Performance?Why Does RISC Lead to Improved Performance?
Increased GPRs (Also, Register Windows)Increased GPRs (Also, Register Windows)
■ leads to decreased data traffic to memory.leads to decreased data traffic to memory.
Register-Register architecture leads to more Register-Register architecture leads to more uniform instructions:uniform instructions:
■ Efficient pipelining becomes possible.Efficient pipelining becomes possible.
However, large instruction memory traffic:However, large instruction memory traffic:
■ Because of larger number of instructions.Because of larger number of instructions.
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
RISC-Like instructions can be scheduled to achieve RISC-Like instructions can be scheduled to achieve efficiency:efficiency:■ Either by compiler orEither by compiler or■ By hardware (Dynamic instruction scheduling).By hardware (Dynamic instruction scheduling).
Suppose we need to add 2 values and store the Suppose we need to add 2 values and store the results back to memory.results back to memory.
In CISC:In CISC:■ It would be done using a single instruction.It would be done using a single instruction.
In RISC:In RISC:■ 4 instructions would be necessary.4 instructions would be necessary.
Why Does RISC Lead to Improved Performance?Why Does RISC Lead to Improved Performance?
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Case Study
MIPS Instruction Set architecture
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance
Data Transfer– 32 words in registers, millions of words in main memory– Instruction accesses memory via memory address– Address indexes memory, a large single-dimensional array
Alignment– Most architectures address individual bytes– Addresses of sequential words differ by 4– Words must always start at addresses that are multiples of 4
Operands in MIPS
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
r0: always 0r1: reserved for the assemblerr2, r3: function return valuesr4~r7: function call argumentsr8~r15: “caller-saved” temporariesr16~r23 “callee-saved” temporariesr24~r25 “caller-saved” temporariesr26, r27: reserved for the operating systemr28: global pointerr29: stack pointerr30: callee-saved temporariesr31: return address
Register Usage Conventions Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Data Types in MIPS
Data and instructions are both represented in bits– 32-bit architectures employ 32-bit instructions– Combination of fields specifying operations/operands
MIPS operates on:32-bit (unsigned or 2’s complement) integers,32-bit (single precision floating point) real numbers, 64-bit (double precision floating point) real numbers; 32-bit words, bytes and half words can be loaded into GPRs After loading into GPRs, bytes and half words are either zero or sign bit expanded to fill the 32 bits; Only 32-bit units can be loaded into FPRs and 32-bit real numbers are stored in even numbered FPRs. 64-bit real numbers are stored in two consecutive FPRs, starting with even-numbered register.Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit significand double:11-bit exponent, 52-bit significand
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
MIPS Memory OrganizationMIPS Memory Organization
• MIPS supports byte addressability:
it implies that a byte is the smallest unit with its
address;
32-bit word has to start at byte address that is
multiple of 4;
Thus, 32-bit word at address 4n includes four bytes
with addresses: 4n, 4n+1, 4n+2, and 4n+3.
16-bit half word has to start at byte address that is
multiple of 2; Thus, 16-bit word at address 2n includes
two bytes with addresses: 2n and 2n+1.
it implies that an address is given as 32-bit unsigned
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
MIPS Instruction Formats
R-Type Instruction Fields– op: basic operation of instruction, called the opcode– rs, rt: first, second register source operand– rd: register destination operand– shamt: shift amount for shift instructions– funct: specifies a variant of the operation, called function code
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
• I-Type Instruction Fields– Opcode specifies instruction format
• J-Type Instruction Fields
MIPS Instruction Formats
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
MIPS Addressing ModesMIPS Addressing Modes
• Register Addressing– Operand is a register (e.g., add $s2, $s0, $s1)
• Base or Displacement Addressing– Operand is at memory location whose address is sum of register and constant (e.g., lw $s1, 8($s0))
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
• Immediate Addressing– Operand is constant within instruction (e.g., addi $s1, $s0, 4)
MIPS Addressing ModesMIPS Addressing Modes
• PC-Relative Addressing– Address is sum of PC and constant within instruction (e.g., beq $s0, $s1, 16)
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
MIPS Addressing ModesMIPS Addressing Modes
• Pseudo-direct Addressing– Address is 26 bits of constant within instruction concatenated with upper 6 bits of PC (e.g., j 1000)
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Survey of MIPS Instruction Set• Arithmetic
• Addition– add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected– addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow undetected
• Subtraction– sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected– subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow undetected
• Immediate/Constants– addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected– addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow undetected
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
• Multiply– mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product– multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned product
• Divide– div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3– divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3, unsigned
• Ancillary– mfhi $s1 # $s1 = Hi, get copy of Hi– mflo $s1 # $s1 = Lo, get copy of low
• Arithmetic
Survey of MIPS Instruction Set
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
• Boolean Operations– and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND– or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR
• Immediate/Constants– andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND– ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR
• Shifting– sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left– srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right
Survey of MIPS Instruction Set
• Logical
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
• Load Operations– lw $s1, 100($s2) # $s1 = Mem($s2+100), load word– lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte– lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.
• Store Operations– sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word– sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte
Survey of MIPS Instruction Set
• Data Transfer
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Survey of MIPS Instruction Set• Control Transfer
• Jump Operations– j 2500 # go to 10000– jal 2500 # $ra = PC+4, go to 10000, for procedure call– jr $ra # go to $ra, for procedure return
• Branch Operations– beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative– bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative
• Comparison Operations– slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2’s comp.– slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2’s comp.– sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned– sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Survey of MIPS Instruction Set• Floating Point Operations
• Arithmetic– {add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision– {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision– Floating-point registers are used in even-odd pairs, using evennumber register as its name
• Data Transfer– {lwc1, swc1} $f1, 100($s2)– Transfer data to/from floating-point register file
• Conditional Branch– {c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0– {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Thanks!Thanks!