Computer Organisation
-
Upload
chatakondamanikanta -
Category
Documents
-
view
997 -
download
4
description
Transcript of Computer Organisation
UNIT -1Computer Types: A digital computer can be defined as a fast electronic calculating machine that accepts digitized input information,processes it according to a list of internally stored instructions and produces the resulting output information Commonly used computer types are1. Personal Computer2. Notebook Computers3. 3.Workstations4. Enterprise Systems & Servers5. Supercomputers
PERSONAL COMPUTERS:1. They are the most commonly used Computers2. They are used in houses, schools & business offices3. They are Desktop Computers. Desktop Computers are the computers that have processing & storage units, visual display and audio output units and a keyboard that can be placed easily on an office desk.
NOTEBOOK COMPUTERS:They are the personal computers in which all its components such as processing unit storage unit and so on can be packaged into a single unit in the form of thin briefcase.They are also called as Laptop Computers
WORKSTATIONS:They have more computational power than that of a personal computerThey have some features of PC but they have high resolution graphics input output capabilityThey are generally used in engineering applications especially for interactive design workThey are used in design works such as animation & video editingExample: Ultra 60 Workstation from Sun Microsystems
ENTERPRISE SYSTEMS:
They are also called as MainframesThey are generally used for data processing in businesses in medium to large corporations which require more computing power and storage capacity than WorkstationsExample: IBM S/390 Servers contains sizable storage unitsThey are capable of handling large number of requests to access the dataThey are widely used in education, business and personal user communitiesAll the requests and responses are usually transported over communication facilities
SUPER COMPUTERS:
They are used for large scale numerical operations required in applications such as Weather Forecasting, Aircraft design & Simulation They are the most powerful computers and they are of large size and they are used to process huge amounts of dataThey are also used by Nuclear Scientists to analyze Nuclear Fission & Nuclear Fusion
Example: Cray T90
FUNCTIONAL UNITS OF A COMPUTER:
The functional units of a Computer areArithmetic and logical unitControl UnitMemory UnitOutput UnitInput Unit
Arithmetic and Logical Unit:It is the unit of the System where most of computer operations are carried outFor example if we want to perform any arithmetic or logical operation first we have to fetch the required operands from memory into processor and in the processor required operation is carried out by ALU and result may be stored in memory or retained in the processor for immediate useWhen operands are brought into processor, they are stored in high speed storage elements called as registersEach Register can store one word of dataRegisters are faster accessible than the memory words
Control Unit:It is the unit which coordinates the operations of memory unit, ALU, Input and Output unitsIt is the nerve center of our system that sends control signals to other units and senses their statesTiming signals are signals that will determine when a given action has to take placeData transfer between processor and memory is controlled by control unit through timing signalsWe can consider CU as a well defined physically separate unit that interact with other units but in practice, most of the control circuitry is physically distributed throughout the machine.
Memory Unit:The function of memory unit is to store programs and dataMemory is organized in the form of memory words. Each memory word will have an unique address.memory Words are sequentially addressed starting with zero.Memory words can be accessed sequentially or randomlyIn Sequential Access Memory, memory words are accessed sequentially one by oneIn this type of memory, 4th memory word is accessed after accessing 0,1,2 and 3rd memories onlyIn Random Access memory, memory words can be accessed randomly.The time required for accessing all memory words will be fixed
Output unit:The main purpose of Output Unit is to send the processed results to the external worldCommonly used output devices are monitors and printersWe have 2 types of printers Impact printers and Non Impact PrintersImpact Printers are the printers in which images are created by using pressing operationExample: Type Writer, Dot Matrix PrinterNon Impact Printers are the printers in which image is created by using methods such as Spraying, photocopying Example: Ink Jet Printers ,Laser Printers
Dot matrix Printers create image by using mechanical device called Print head
Ink jet Printers uses ink jet streams for printing
Input Unit:Input Units are the devices that are used for accepting input information such as programs and data from external world or user
Commonly Used input devices are Key Board, Mouse, Joy Stick
Some devices will function as both input unit and output unit. For example touch screen is used as both input device and output device. So we refer as I/O unit
BASIC OPERATIONAL STEPS:
The following are the operating steps for the execution of a program
1. Program is stored in memory through input unit
2. Program Counter(PC) will point to first instruction of a program at the starting of execution of the program
3. Contents of PC are transferred to MAR and a read control signal is sent to memory
4. After memory access time is completed ,the addressed word is read out of memory and loaded into MDR
5. Next the contents of MDR are transferred to IR and now the instruction is ready to be executed
6. if the instruction involves any operation to be performed by ALU, it is necessary to obtain required operands
7. if an operand resides in memory, it has to be fetched by sending its address to MAR and initiating a read cycle
8. When the operand has been read from memory into MDR ,it is transferred from MDR to ALU
9. After fetching one or more operands in this way ALU can perform the desired operation
10. If the result of this operation is to be stored in memory then result is sent to MDR
11. The address of location where the result is to be stored is sent to MAR and a write cycle is initiated
12. Somewhere during the execution of current instruction, contents of PC is incremented and it contains address of next instruction to be executed
13. As soon as the execution of current instruction is completed, a new instruction fetch may be started..DATA REPRESENTATIONInformation that a Computer is dealing with
* Data - Numeric Data Numbers ( Integer, real) - Non-numeric Data Letters, Symbols
NUMERIC DATA REPRESENTATION
R = 10 Decimal number system,R = 2 BinaryR = 8 Octal,R = 16 Hexadecimal
Radix point(.) separates the integer portion and the fractional portion
DataNumeric data – numbers (integer, real) Non-numeric data - symbols, letters
Number SystemNon positional number system - Roman number systemPositional number system - Each digit position has a value called a weight associated with it - Decimal, Octal, Hexadecimal, BinaryBase (or radix) R number - Uses R distinct symbols for each digit - Example AR = an-1 an-2 ... a1 a0 .a-1…a-m
- V(AR ) =
1n
mi
iiRa
* Relationship between data elements - Data Structures Linear Lists, Trees, Rings, etc
* Program (Instruction) NUMERIC DATA REPRESENTATION
REPRESENTATION OF NUMBERS - POSITIONAL NUMBERS
Decimal Binary Octal Hexadecimal 00 0000 00 0 01 0001 01 1 02 0010 02 2 03 0011 03 3 04 0100 04 4 05 0101 05 5 06 0110 06 6 07 0111 07 7 08 1000 10 8 09 1001 11 9 10 1010 12 A 11 1011 13 B 12 1100 14 C 13 1101 15 D 14 1110 16 E 15 1111 17 F
Binary, octal, and hexadecimal conversion
1 0 1 0 1 1 1 1 0 1 1 0 0 0 1 11 2 7 5 4 3
A F 6 3
OctalBinaryHexa
Data Types
CONVERSION OF BASES
Decimal to Base R number
Base R to Decimal Conversion
V(A) = ak.RkA = an-1 an-2 an-3 … a0 . a-1 … a-m
(736.4)8 = 7 x 82 + 3 x 81 + 6 x 80 + 4 x 8-1 = 7 x 64 + 3 x 8 + 6 x 1 + 4/8 = (478.5)10(110110)2 = ... = (54)10(110.111)2 = ... = (6.785)10(F3)16 = ... = (243)10(0.325)6 = ... = (0.578703703 .................)10
- Separate the number into its integer and fraction parts and convert each part separately.- Convert integer part into the base R number → successive divisions by R and accumulation of the remainders.- Convert fraction part into the base R number → successive multiplications by R and accumulation of integer digits
Data Types
EXAMPLE
Convert 41.687510 to base 2.
Integer = 414120 110 0 5 0 2 1 1 0 0 1
Fraction = 0.68750.6875x 21.3750x 20.7500x 21.5000 x 21.0000
(41)10 = (101001)2 (0.6875)10 = (0.1011)2
(41.6875)10 = (101001.1011)2
Convert (63)10 to base 5: (223)5Convert (1863)10 to base 8: (3507)8Convert (0.63671875)10 to hexadecimal: (0.A3)16
Exercise
Data Types
COMPLEMENT OF NUMBERS
Two types of complements for base R number system: - R's complement and (R-1)'s complement
The (R-1)'s Complement Subtract each digit of a number from (R-1)
Example - 9's complement of 83510 is 16410 - 1's complement of 10102 is 01012(bit by bit complement operation)
The R's Complement Add 1 to the low-order digit of its (R-1)'s complement
Example - 10's complement of 83510 is 16410 + 1 = 16510 - 2's complement of 10102 is 01012 + 1 = 01102
FIXED POINT NUMBERS
Binary Fixed-Point Representation
X = xnxn-1xn-2 ... x1x0. x-1x-2 ... x-m
Sign Bit (xn): 0 for positive & 1 for negative
Remaining Bits (xn-1xn-2 ... x1x0. x-1x-2 ... x-m)
Numbers:Fixed Point Numbers and Floating Point Numbers
SIGNED NUMBERS
Signed magnitude representation Signed 1's complement representation Signed 2's complement representation
Example: Represent +9 and -9 in 7 bit-binary number
Only one way to represent +9 ==> 0 001001 Three different ways to represent -9: In signed-magnitude: 1 001001 In signed-1's complement: 1 110110 In signed-2's complement: 1 110111
In general, in computers, fixed point numbers are represented either integer part only or fractional part only.
Need to be able to represent both positive and negative numbers
- Following 3 representations
CHARACTERISTICS OF 3 DIFFERENT REPRESENTATIONS
ComplementSigned magnitude:Complement only the sign bit Signed 1's complement:Complement all the bits including sign bitSigned 2's complement:Take the 2's complement of the number,including its sign bit. Maximum and minimum represent able Numbers and Representation of Zero X = xn xn-1 ... x0 . x-1 ...
x-mSigned Magnitude
Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -(2n - 2-m) 111 ... 11.11 ... 1 Zero: +0 000 ... 00.00 ... 0 -0 100 ... 00.00 ... 0
Signed 1’s Complement
Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -(2n - 2-m) 100 ... 00.00 ... 0 Zero: +0 000 ... 00.00 ... 0 -0 111 ... 11.11 ... 1
Signed 2’s Complement
Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -2n 100 ... 00.00 ... 0 Zero: 0 000 ... 00.00 ... 0
2’s COMPLEMENT REPRESENTATION WEIGHTS
Signed 2’s complement representation follows a “weight” scheme similar to that of unsigned numbersSign bit has negative weightOther bits have regular weights
X = xn xn-1 ... x0
V(X) = - xn 2n + xi 2ii = 0
n-1
ARITHMETIC ADDITION: SIGNED MAGNITUDE
[1] Compare their signs[2] If two signs are the same, ADD the two magnitudes - Look out for an overflow[3] If not the same, compare the relative magnitudes of the numbers and then SUBTRACT the smaller from the larger --> need a subtractor to add
Fixed Point Representations
ARITHMETIC ADDITION: SIGNED 2’s COMPLEMENT
Example 6 0 0110 9 0 1001 15 0 1111
-6 1 1010 9 0 1001 3 0 0011
6 0 0110 -9 1 0111 -3 1 1101
-9 1 0111 -9 1 0111 -18 (1)0 1110
Add the two numbers, including their sign bit, and discard any carry out of leftmost (sign) bit - Look out for an overflow
overflow9 0 10019 0 1001+)
+)
+)
+)
+)
18 1 0010 2 operands have the same signand the result sign changes
xn-1yn-1s’n-1 + x’n-1y’n-1sn-1 = cn-1 cn
x’n-1y’n-1sn-1(cn-1 cn)
xn-1yn s’n-1(cn-1 cn)
Fixed Point Representations
[1] Compare their signs[2] If two signs are the same, ADD the two magnitudes - Look out for an overflow[3] If not the same, compare the relative magnitudes of the numbers and then SUBTRACT the smaller from the larger --> need a subtractor to add
6 0110+) 9 1001 15 1111 -> 01111
9 1001- ) 6 0110 3 0011 -> 00011
9 1001 -) 6 0110 - 3 0011 -> 10011
6 0110+) 9 1001 -15 1111 -> 11111
6 + 9 -6 + 9
6 + (- 9) -6 + (-9)
Overflow 9 + 9 or (-9) + (-9) 9 1001+) 9 1001 (1)0010Overflo
w
ARITHMETIC ADDITION: SIGNED 1’s COMPLEMENTFixed Point Representations
6 0 0110 -9 1 0110 -3 1 1100
+)
Example
not overflow
(cn-1 cn) = 0
-9 1 0110-9 1 0110 (1)0 1100 1 0 1101
+)
+)
9 0 10019 0 1001 1 (1)0010
+)
overflow(cn-1 cn)
End-around carry
-6 1 1001 9 0 1001 (1) 0(1)0010 1 3 0 0011
+)
+)
COMPARISON OF REPRESENTATIONS
* Easiness of negative conversion S + M > 1’s Complement > 2’s Complement* Hardware - S+M: Needs an adder and a subtractor for Addition - 1’s and 2’s Complement: Need only an adder
* Speed of Arithmetic 2’s Complement > 1’s Complement (end-around C)
* Recognition of Zero
2’s Complement is fast
Fixed Point Representations
Arithmetic Subtraction in 2’s complement Take the complement of the subtrahend (including the sign bit) and add it to the minuend including the sign bits. ( ± A ) - ( - B ) = ( ± A ) + B ( ± A ) - B = ( ± A ) + ( - B )
Add the two numbers, including their sign bits. - If there is a carry out of the most significant (sign) bit, the result is incremented by 1 and the carry is discarded.
FLOATING POINT NUMBER REPRESENTATION
* The location of the fractional point is not fixed to a certain location* The range of the representable numbers is wide F = EM
mn ekek-1 ... e0 mn-1mn-2 … m0 . m-1 … m-m
sign exponent mantissa
- Mantissa Signed fixed point number, either an integer or a fractional number
- Exponent Designates the position of the radix point Decimal Value V(F) = V(M) * RV(E)
M: MantissaE: ExponentR: Radix
Floating Point Representation
CHARACTERISTICS OF FLOATING POINT NUMBER REPRESENTATIONS
Normal Form - There are many different floating point number representations of the same number → Need for a unified representation in a given computer - the most significant position of the mantissa contains a non-zero digit
Representation of Zero
- Zero Mantissa = 0
- Real Zero Mantissa = 0 Exponent = smallest representable number which is represented as 00 ... 0 Easily identified by the hardware
Floating Point Representation
FLOATING POINT NUMBERS
0 .1234567 0 04sign sign
mantissa exponent==> +.1234567 x 10+04
Example
Note: In Floating Point Number representation, only Mantissa(M) and Exponent(E) are explicitly represented. The Radix(R) and the position of the Radix Point are implied.
Example A binary number +1001.11 in 16-bit floating point number representation (6-bit exponent and 10-bit fractional mantissa)
0 0 00100 100111000
0 0 00101 010011100
Exponent MantissaSignor
OTHER DECIMAL CODES Decimal BCD(8421) 2421 84-2-1 Excess-3
0 0000 0000 0000 0011 1 0001 0001 0111 0100 2 0010 0010 0110 0101 3 0011 0011 0101 0110 4 0100 0100 0100 0111 5 0101 1011 1011 1000 6 0110 1100 1010 1001 7 0111 1101 1001 1010 8 1000 1110 1000 1011 9 1001 1111 1111 1100
d3 d2 d1 d0: symbol in the codes
BCD: d3 x 8 + d2 x 4 + d1 x 2 + d0 x 1 8421 code. 2421: d3 x 2 + d2 x 4 + d1 x 2 + d0 x 1 84-2-1: d3 x 8 + d2 x 4 + d1 x (-2) + d0 x (-1) Excess-3: BCD + 3
Note: 8,4,2,-2,1,-1 in this table is the weight associated with each bit position.
BCD: It is difficult to obtain the 9's complement. However, it is easily obtained with the other codes listed above.→ Self-complementing codes
External Representations
GRAY CODE - ANALYSIS
Letting gngn-1 ... g1 g0 be the (n+1)-bit Gray code for the binary number bnbn-1 ... b1b0
gi = bi bi+1 , 0 i n-1 gn = bnand bn-i = gn gn-1 . . . gn-i bn = gn
0 0 0 0 00 0 0001 0 1 0 01 0 001 1 1 0 11 0 011 1 0 0 10 0 010 1 10 0 110 1 11 0 111 1 01 0 101 1 00 0 100 1 100 1 101 1 111 1 010 1 011 1 001 1 101 1 000
The Gray code has a reflection property - easy to construct a table without calculation, - for any n: reflect case n-1 about a mirror at its bottom and prefix 0 and 1 to top and bottom halves, respectively
Reflection of Gray codes
Note:
Other Binary codes
GRAY CODEOther Binary codes
* Characterized by having their representations of the binary integers differ in only one digit between consecutive integers* Useful in some applications
4-bit Gray codes
Decimalnumber
Gray Binary g3 g2 g1 g0 b3 b2 b1 b0
0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 3 0 0 1 0 0 0 1 1 4 0 1 1 0 0 1 0 0 5 0 1 1 1 0 1 0 1 6 0 1 0 1 0 1 1 0 7 0 1 0 0 0 1 1 1 8 1 1 0 0 1 0 0 0 9 1 1 0 1 1 0 0 110 1 1 1 1 1 0 1 011 1 1 1 0 1 0 1 112 1 0 1 0 1 1 0 013 1 0 1 1 1 1 0 114 1 0 0 1 1 1 1 015 1 0 0 0 1 1 1 1
CHARACTER REPRESENTATION ASCII
ASCII (American Standard Code for Information Interchange) Code
Other Binary codes
MSB (3 bits)
ERROR DETECTING CODESParity System
- Simplest method for error detection - One parity bit attached to the information - Even Parity and Odd Parity
Even Parity - One bit is attached to the information so that the total number of 1 bits is an even number
1011001 0 1010010 1
Odd Parity - One bit is attached to the information so that the total number of 1 bits is an odd number
1011001 1 1010010 0
Error Detecting codes
PARITY BIT GENERATIONParity Bit Generation For b6b5... b0(7-bit information); even parity bit beven For odd parity bit
beven = b6 Å b5 Å ... Å b0 bodd = beven Å 1 = beven
0123456789ABCDEF
NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
SP!“#$%&‘()*+,-./
0123456789:;<=>?
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]mn
‘abcdefghIjklmno
Pqrstuvwxyz{|}~DEL
0 1 2 3 4 5 6 7
DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
LSB(4 bits)
CONTROL CHARACTER REPRESENTAION (ACSII)
NUL NullSOH Start of Heading (CC)STX Start of Text (CC)ETX End of Text (CC)EOT End of Transmission (CC)ENQ Enquiry (CC)ACK Acknowledge (CC)BEL BellBS Backspace (FE)HT Horizontal Tab. (FE)LF Line Feed (FE)VT Vertical Tab. (FE)FF Form Feed (FE)CR Carriage Return (FE)SO Shift OutSI Shift InDLE Data Link Escape (CC)
DC1 Device Control 1DC2 Device Control 2DC3 Device Control 3DC4 Device Control 4NAK Negative Acknowledge (CC)SYN Synchronous Idle (CC)ETB End of Transmission Block (CC)CAN CancelEM End of MediumSUB SubstituteESC EscapeFS File Separator (IS)GS Group Separator (IS)RS Record Separator (IS)US Unit Separator (IS)DEL Delete
(CC) Communication Control(FE) Format Effector(IS) Information Separator
Other Binary codes
PARITY GENERATOR AND PARITY CHECKER
Parity Generator Circuit (even parity)b
6b5b4b3b2b1
b0
beven
Parity Checker
b6b5b4b3b2b1
b0
beven
Even Parity error indicator
Error Detecting codes
REGISTER TRANSFER AND MICROOPERATIONS
• Register Transfer Language
• Register Transfer
• Bus and Memory Transfers
• Arithmetic Microoperations
• Logic Microoperations
• Shift Microoperations
• Arithmetic Logic Shift Unit
Unit-2
MICROOPERATIONS (1)
Register Transfer Language
The operations on the data in registers are called microoperations.The functions built into registers are examples of microoperationsShiftLoadClearIncrement…
ORGANIZATION OF A DIGITAL SYSTEM
- Set of registers and their functions- Micro operations set Set of allowable micro operations provided by the organization of the computer- Control signals that initiate the sequence of micro operations (to perform the functions)
Definition of the (internal) organization of a computer
Register Transfer Language
REGISTER TRANSFER LANGUAGE
Register Transfer Language
Rather than specifying a digital system in words, a specific notation is used, register transfer languageFor any function of the computer, the register transfer language can be used to describe the (sequence of) microoperationsRegister transfer languageA symbolic languageA convenient tool for describing the internal organization of digital computersCan also be used to facilitate the design process of digital systems.
DESIGNATION OF REGISTERS
Register Transfer Language
Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13, IR)Often the names indicate function:MAR- memory address registerPC- program counterIR- instruction registerRegisters and their contents can be viewed and represented in various waysA register can be viewed as a single entity:
Registers may also be represented showing the bits of data they contain
MAR
DESIGNATION OF REGISTERS
Register Transfer Language
R1 Register
Numbering of bits
Showing individual bits
Subfields
PC(H) PC(L)15 8 7 0
- a register - portion of a register - a bit of a register
Common ways of drawing the block diagram of a register
7 6 5 4 3 2 1 0
R215 0
Designation of a register
REGISTER TRANSFER Register Transfer
• Copying the contents of one register to another is a register transfer• A register transfer is indicated as
R2 R1– In this case the contents of register R2 are copied (loaded) into register R1
REGISTER TRANSFERRegister Transfer
A register transfer such as
R3 R5
Implies that the digital system hasthe data lines from the source register (R5) to the destination register (R3)Parallel load in the destination register (R3)Control lines to perform the action
CONTROL FUNCTIONS Register TransferOften actions need to only occur if a certain condition is trueThis is similar to an “if” statement in a programming languageIn digital systems, this is often done via a control signal, called a control functionIf the signal is 1, the action takes placeThis is represented as:
P: R2 R1
Which means “if P = 1, then load the contents of register R1 into register R2”, i.e., if (P = 1) then (R2 R1)
• Copying the contents of one register to another is a register transfer• A register transfer is indicated as
R2 R1– In this case the contents of register R2 are copied (loaded) into register R1
HARDWARE IMPLEMENTATION OF CONTROLLED TRANSFERS
Implementation of controlled transfer
P: R2 R1
Block diagram
Timing diagram
Clock
Register Transfer
Transfer occurs here
R2
R1
Control Circuit
LoadP
n
Clock
Load
t t+1
The same clock controls the circuits that generate the control function and the destination register Registers are assumed to use positive-edge-triggered flip-flops
BASIC SYMBOLS FOR REGISTER TRANSFERS
Capital letters Denotes a register MAR, R2 & numerals Parentheses () Denotes a part of a register R2(0-7), R2(L)Arrow Denotes transfer of information R2 R1Colon : Denotes termination of control function P:Comma , Separates two micro-operations A B, B A
Symbols Description Examples
Register Transfer
SIMULTANEOUS OPERATIONS• If two or more operations are to occur simultaneously, they are separated with commas
P: R3 ¬ R5, MAR ¬ IR
• Here, if the control function P = 1, load the contents of R5 into R3, and at the same time (clock), load the contents of register IR into register MAR
BUS AND BUS TRANSFERBus is a path(of a group of wires) over which information is transferred, from any of several sources to any of several destinations.
From a register to bus: BUS R
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4Register A
Register B Register C Register D
B C D1 1 1
4 x1MUX
B C D2 2 2
4 x1MUX
B C D3 3 3
4 x1MUX
B C D4 4 4
4 x1MUX
4-line bus
x
yselect
0 0 0 0
Register A Register B Register C Register D
Bus lines
Bus and Memory Transfers
TRANSFER FROM BUS TO A DESTINATION REGISTER
Three-State Bus Buffers
Bus line with three-state buffersReg. R0 Reg. R1 Reg. R2 Reg. R3
Bus lines
2 x 4Decoder
Load
D0 D1 D2 D3z
wSelect E (enable)
Output Y=A if C=1High-impedence if C=0Normal input A
Control input C
Select
Enable
0123
S0S1
A0B0C0D0
Bus line for bit 0
Bus and Memory Transfers
BUS TRANSFER IN RTL• Depending on whether the bus is to be mentioned explicitly or not, register transfer can be indicated as
either or
• In the former case the bus is implicit, but in the latter, it is explicitly indicated
SUMMARY OF R. TRANSFER MICROOPERATIONSBus and Memory Transfers
A B Transfer content of reg. B into reg. AAR DR(AD)Transfer content of AD portion of reg. DR into reg. ARA constantTransfer a binary constant into reg. AABUS R1, Transfer content of R1 into bus A and, at the same time, R2 ABUS transfer content of bus A into R2 AR Address registerDR Data registerM[R] Memory word specified by reg. RM Equivalent to M[AR]DR M Memory read operation: transfers content of memory word specified by AR into DRM DR Memory write operation: transfers content of DR into memory word specified by AR
ARITHMETIC MICROOPERATIONS
Summary of Typical Arithmetic Micro-Operations
Arithmetic Microoperations
R3 R1 + R2 Contents of R1 plus R2 transferred to R3R3 R1 - R2Contents of R1 minus R2 transferred to R3R2 R2’Complement the contents of R2 R2 R2’+ 1 2's complement the contents of R2 (negate)R3 R1 + R2’+ 1 subtractionR1 R1 + 1 IncrementR1 R1 - 1 Decrement
The basic arithmetic microoperations areAdditionSubtractionIncrement DecrementThe additional arithmetic microoperations areAdd with carrySubtract with borrowTransfer/Loadetc. …
• Computer system microoperations are of four types:Register transfer microoperations- Arithmetic microoperations- Logic microoperations- Shift microoperations
BINARY ADDER / SUBTRACTOR / INCREMENTER
FA
B0 A0
S0
C0FA
B1 A1
S1
C1FA
B2 A2
S2
C2FA
B3 A3
S3
C3
C4
Binary Adder-Subtractor
FA
B0 A0
S0
C0C1FA
B1 A1
S1
C2FA
B2 A2
S2
C3FA
B3 A3
S3C4
M
Binary Incrementer
HAx y
C S
A0 1
S0
HAx y
C S
A1
S1
HAx y
C S
A2
S2
HAx y
C S
A3
S3C4
Binary Adder
Arithmetic Microoperations
ARITHMETIC CIRCUIT
S1S00123
4x1MUX
X0
Y0
C0
C1
D0FA
S1S00123
4x1MUX
X1
Y1
C1
C2
D1FA
S1S00123
4x1MUX
X2
Y2
C2
C3
D2FA
S1S00123
4x1MUX
X3
Y3
C3
C4
D3FA
Cout
A0
B0
A1
B1
A2
B2
A3
B3
0 1
S0
S1
Cin
S1S0CinYOutputMicrooperation0 00BD = A + BAdd0 01BD = A + B + 1Add with carry0 10B’D = A + B’Subtract with borrow0 11B’D = A + B’+ 1Subtract1 000D = ATransfer A 1 010D = A + 1Increment A1 101D = A - 1Decrement A1 111D = ATransfer A
Arithmetic Microoperations
LOGIC MICROOPERATIONS
Logic Microoperations
Specify binary operations on the strings of bits in registersLogic microoperations are bit-wise operations, i.e., they work on the individual bits of datauseful for bit manipulations on binary data useful for making logical decisions based on the bit valueThere are, in principle, 16 different logic functions that can be defined over two binary input variablesHowever, most systems only implement four of theseAND (), OR (), XOR (), Complement/NOTThe others can be created from combination of these
0 0 0 0 0 … 1 1 10 1 0 0 0 … 1 1 11 0 0 0 1 … 0 1 11 1 0 1 0 … 1 0 1
A B F0 F1 F2 … F13 F14 F15
LIST OF LOGIC MICROOPERATIONS
List of Logic Microoperations - 16 different logic operations with 2 binary vars. - n binary vars → functions
2 2 n
Truth tables for 16 functions of 2 variables and the corresponding 16 logic micro operations
BooleanFunction
Micro-Operations Name
x 0 0 1 1y 0 1 0 1
Logic Microoperations
0 0 0 0 F0 = 0 F 0 Clear 0 0 0 1 F1 = xy F A B AND 0 0 1 0 F2 = xy' F A B’ 0 0 1 1 F3 = x F A Transfer A 0 1 0 0 F4 = x'y F A’ B 0 1 0 1 F5 = y F B Transfer B 0 1 1 0 F6 = x y F A B Exclusive-OR 0 1 1 1 F7 = x + y F A B OR 1 0 0 0 F8 = (x + y)' F A B)’ NOR 1 0 0 1 F9 = (x y)' F (A B)’ Exclusive-NOR 1 0 1 0 F10 = y' F B’ Complement B 1 0 1 1 F11 = x + y' F A B 1 1 0 0 F12 = x' F A’ Complement A 1 1 0 1 F13 = x' + y F A’ B 1 1 1 0 F14 = (xy)' F (A B)’ NAND 1 1 1 1 F15 = 1 F all 1's Set to all 1's
HARDWARE IMPLEMENTATION OF LOGIC MICROOPERATIONS
0 0 F = A B AND0 1 F = AB OR1 0 F = A B XOR1 1 F = A’ Complement
S1 S0
Output -operation Function table
Logic Microoperations
B
A
S
S
F
1
0
i
i
i0
1
2
3
4 X 1MUX
Select
APPLICATIONS OF LOGIC MICROOPERATIONS
Logic Microoperations
Logic microoperations can be used to manipulate individual bits or a portions of a word in a registerConsider the data in a register A. In another register, B, is bit data that will be used to modify the contents of ASelective-set A A + BSelective-complement A A BSelective-clear A A • B’Mask (Delete) A A • BClear A A BInsert A (A • B) + CCompare A A B . . .
• SELECTIVE SETIn a selective set operation, the bit pattern in B is used to set certain bits in A
1 1 0 0 At1 0 1 0 B1 1 1 0 At+1 (A ¬ A + B)
• If a bit in B is set to 1, that same position in A gets set to 1, otherwise that bit in A keeps its previous value
SELECTIVE COMPLEMENT• In a selective complement operation, the bit pattern in B is used to complement certain bits in A
1 1 0 0 At1 0 1 0 B0 1 1 0 At+1 (A ¬ A Å B)
• If a bit in B is set to 1, that same position in A gets complemented from its original value, otherwise it is unchanged
SELECTIVE CLEAR• In a selective clear operation, the bit pattern in B is used to clear certain bits in A
1 1 0 0 At1 0 1 0 B0 1 0 0 At+1 (A ¬ A × B’)
• If a bit in B is set to 1, that same position in A gets set to 0, otherwise it is unchanged
MASK OPERATION• In a mask operation, the bit pattern in B is used to clear certain bits in A
1 1 0 0 At1 0 1 0 B1 0 0 0 At+1 (A ¬ A × B)
• If a bit in B is set to 0, that same position in A gets set to 0, otherwise it is unchanged
CLEAR OPERATION• In a clear operation, if the bits in the same position in A and B are the same, they are cleared in A,
otherwise they are set in A
1 1 0 0 At1 0 1 0 B0 1 1 0 At+1 (A ¬ A Å B)
INSERT OPERATION• An insert operation is used to introduce a specific bit pattern into A register, leaving the other bit
positions unchanged• This is done as
– A mask operation to clear the desired bit positions, followed by– An OR operation to introduce the new bits into the desired positions– Example
» Suppose you wanted to introduce 1010 into the low order four bits of A: 1101 1000 1011 0001 A (Original) 1101 1000 1011 1010 A (Desired)
» 1101 1000 1011 0001 A (Original)1111 1111 1111 0000 Mask1101 1000 1011 0000 A (Intermediate)0000 0000 0000 1010 Added bits1101 1000 1011 1010 A (Desired)
SHIFT MICROOPERATIONSShift Microoperations
There are three types of shiftsLogical shiftCircular shiftArithmetic shiftWhat differentiates them is the information that goes into the serial input
Serialinput
A right shift operation
A left shift operation
Serialinput
LOGICAL SHIFTShift Microoperations
In a logical shift the serial input to the shift is a 0.A right logical shift operation:
A left logical shift operation:
In a Register Transfer Language, the following notation is usedshl for a logical shift left shrfor a logical shift rightExamples:R2 shr R2 R3 shl R3
0
0
CIRCULAR SHIFTShift Microoperations
In a circular shift the serial input is the bit that is shifted out of the other end of the register.A right circular shift operation:
A left circular shift operation:
In a RTL, the following notation is usedcil for a circular shift left cirfor a circular shift rightExamples: R2 cir R2 R3 cil R3
Shift Microoperations
An arithmetic shift is meant for signed binary numbers (integer)An arithmetic left shift multiplies a signed number by twoAn arithmetic right shift divides a signed number by twoThe main distinction of an arithmetic shift is that it must keep the sign of the number the same as it performs the multiplication or divisionA right arithmetic shift operation:
A left arithmetic shift operation:
0
signbit
signbit
ARITHMETIC SHIFT
ARITHMETIC SHIFTShift Microoperations
An left arithmetic shift operation must be checked for the overflow
0
VBefore the shift, if the leftmost two bits differ, the shift will result in anoverflow
In a RTL, the following notation is usedashl for an arithmetic shift leftashrfor an arithmetic shift rightExamples:R2 ashr R2R3 ashl R3
signbit
HARDWARE IMPLEMENTATION OF SHIFT MICROOPERATIONS
Shift Microoperations
S
01
H0MUX
S
01
H1MUX
S
01
H2MUX
S
01
H3MUX
Select0 for shift right (down) 1 for shift left (up)Serial
input (IR)
A0
A1
A2
A3
Serialinput (IL)
ARITHMETIC LOGIC SHIFT UNIT Shift Microoperations
ArithmeticCircuit
LogicCircuit
C
C 4 x 1MUX
Select
0123
F
S3S2S1S0
BA
i
A
D
A
E
shrshl
i+1 i
ii
i+1i-1
i
i
S3 S2 S1 S0 Cin Operation Function0 0 0 0 0 F = A Transfer A0 0 0 0 1 F = A + 1 Increment A0 0 0 1 0 F = A + B Addition0 0 0 1 1 F = A + B + 1 Add with carry0 0 1 0 0 F = A + B’ Subtract with borrow0 0 1 0 1 F = A + B’+ 1 Subtraction0 0 1 1 0 F = A - 1 Decrement A0 0 1 1 1 F = A TransferA0 1 0 0 X F = A B AND0 1 0 1 X F = A B OR0 1 1 0 X F = A B XOR0 1 1 1 X F = A’ Complement A1 0 X X X F = shr A Shift right A into F1 1 X X X F = shl A Shift left A into F
CPU RAM 0
015
Instruction Codes• Every different processor type has its own design (different registers, buses, microoperations, machine
instructions, etc)• Modern processor is a very complex device• It contains
– Many registers– Multiple arithmetic units, for both integer and floating point calculations– The ability to pipeline several consecutive instructions to speed execution– Etc.
• However, to understand how processors work, we will start with a simplified processor model• This is similar to what real processors were like ~25 years ago• M. Morris Mano introduces a simple processor model he calls the Basic Computer• We will use this to introduce processor organization and the relationship of the RTL model to the higher
level computer processor
• The Basic Computer has two components, a processor and memory• The memory has 4096 words in it
– 4096 = 212, so it takes 12 bits to select a word in memory• Each word is 16 bits long
• Program– A sequence of (machine) instructions
• (Machine) Instruction --A group of bits that tell the computer to perform a specific operation (a sequence of micro operation)
• The instructions of a program, along with any needed data are stored in memory• The CPU reads the next instruction from memory• It is placed in an Instruction Register (IR)• Control circuitry in control unit then translates the instruction into the sequence of microoperations
necessary to implement it• Since the memory words, and hence the instructions, are 16 bits long, that leaves 3 bits for the
instruction’s opcode
INSTRUCTION FORMAT Instruction codesA computer instruction is often divided into two partsAn opcode (Operation Code) that specifies the operation for that instructionAn address that specifies the registers and/or locations in memory to use for that operationIn the Basic Computer, since the memory contains 4096 (= 212) words, we needs 12 bit to specify which memory address this instruction will use In the Basic Computer, bit 15 of the instruction specifies the addressing mode (0: direct addressing, 1: indirect addressing)Since the memory words, and hence the instructions, are 16 bits long, that leaves 3 bits for the instruction’s opcode
Opcode Address
Instruction Format
15 14 12 0I
11
Addressing mode
ADDRESSING MODESInstruction codes
The address field of an instruction can represent eitherDirect address: the address in memory of the data to use (the address of the operand), orIndirect address: the address in memory of the address in memory of the data to use
0 ADD
457
22
Operand
457
1 ADD
300
35
1350
300
Operand
1350
+
AC
+
AC
Direct addressing
Indirect addressing
• Effective Address (EA)– The address, that can be directly used without modification to access an operand for a
computation-type instruction, or as the target address for a branch-type instruction
PROCESSOR REGISTERS• A processor has many registers to hold instructions, addresses, data, etc• The processor has a register, the Program Counter (PC) that holds the memory address of the next
instruction to get– Since the memory in the Basic Computer only has 4096 locations, the PC only needs 12 bits
• In a direct or indirect addressing, the processor needs to keep track of what locations in memory it is addressing: The Address Register (AR) is used for this
– The AR is a 12 bit register in the Basic Computer
Registers in the Basic Computer
11 0PC
15 0IR
15 0TR
7 0
OUTR
15 0
DR
15 0AC
11 0AR
INPR0 7
Memory
4096 x 16
CPU
DR 16 Data Register Holds memory operandAR 12 Address Register Holds address for memoryAC 16 Accumulator Processor registerIR 16 Instruction Register Holds instruction codePC 12 Program Counter Holds address of instructionTR 16 Temporary Register Holds temporary dataINPR 8 Input Register Holds input characterOUTR 8 Output Register Holds output character
• When an operand is found, using either direct or indirect addressing, it is placed in the Data Register (DR). The processor then uses this value as data for its operation
• The Basic Computer has a single general purpose register – the Accumulator (AC)
• The significance of a general purpose register is that it can be referred to in instructions– e.g. load AC with the contents of a specific memory location; store the contents of AC into a
specified memory location• Often a processor will need a scratch register to store intermediate results or other temporary data; in the
Basic Computer this is the Temporary Register (TR)• The Basic Computer uses a very simple model of input/output (I/O) operations
– Input devices are considered to send 8 bits of character data to the processor– The processor can send 8 bits of character data to output devices
• The Input Register (INPR) holds an 8 bit character gotten from an input device• The Output Register (OUTR) holds an 8 bit character to be send to an output device
COMMON BUS SYSTEM• The registers in the Basic Computer are connected using a bus• This gives a savings in circuitry over complete connections between registers
BASIC COMPUTER REGISTERSList of BC Registers
Registers
COMMON BUS SYSTEM
Registers
S2S1S0
Bus
Memory unit4096 x 16
LD INR CLR
Address
ReadWrite
AR
LD INR CLR
PC
LD INR CLR
DR
LD INR CLR
ACALUE
INPR
IRLD
LD INR CLR
TR
OUTRLD
Clock
16-bit common bus
7
1
2
3
4
5
6
COMMON BUS SYSTEMRegisters
AR
PC
DR
L I C
L I C
L I C
AC
L I C
ALUE
IR
L
TR
L I C
OUTR LD
INPRMemory
4096 x 16
Address
Read
Write
16-bit Common Bus
7 1 2 3 4 5 6
S0 S1 S2
COMMON BUS SYSTEM
Registers
Three control lines, S2, S1, and S0 control which register the bus selects as its input
Either one of the registers will have its load signal activated, or the memory will have its read signal activatedWill determine where the data from the bus gets loadedThe 12-bit registers, AR and PC, have 0’s loaded onto the bus in the high order 4 bit positionsWhen the 8-bit register OUTR is loaded from the bus, the data comes from the low order 8 bits on the bus
0 0 0x0 0 1AR0 1 0PC0 1 1DR1 0 0AC1 0 1IR1 1 0TR1 1 1Memory
S2 S1 S0 Register
BASIC COMPUTER INSTRUCTIONS
Instructions
Basic Computer Instruction Format
15 14 12 11 0I Opcode Address
Memory-Reference Instructions (OP-code = 000 ~ 110)
Register-Reference Instructions (OP-code = 111, I = 0)
Input-Output Instructions(OP-code =111, I = 1)
15 12 11
0Register operation0 1 1
1
15
12 11
0I/O operation
1 1 1 1
BASIC COMPUTER INSTRUCTIONS
Hex CodeSymbol I = 0 I = 1 Description
AND 0xxx 8xxx AND memory word to ACADD 1xxx 9xxx Add memory word to ACLDA 2xxx Axxx Load AC from memorySTA 3xxx Bxxx Store content of AC into memoryBUN 4xxx Cxxx Branch unconditionallyBSA 5xxx Dxxx Branch and save return addressISZ 6xxx Exxx Increment and skip if zero
CLA 7800 Clear ACCLE 7400 Clear ECMA 7200 Complement ACCME 7100 Complement ECIR 7080 Circulate right AC and ECIL 7040 Circulate left AC and EINC 7020 Increment ACSPA 7010 Skip next instr. if AC is positiveSNA 7008 Skip next instr. if AC is negativeSZA 7004 Skip next instr. if AC is zeroSZE 7002 Skip next instr. if E is zeroHLT 7001 Halt computer
INP F800 Input character to ACOUT F400 Output character from ACSKI F200 Skip on input flagSKO F100 Skip on output flagION F080 Interrupt onIOF F040 Interrupt off
Instructions
INSTRUCTION SET COMPLETENESSA computer should have a set of instructions so that the user can construct machine language programs to evaluate any function that is known to be computable.
• Instruction TypesFunctional Instructions - Arithmetic, logic, and shift instructions - ADD, CMA, INC, CIR, CIL, AND, CLATransfer Instructions - Data transfers between the main memory and the processor registers - LDA, STAControl Instructions - Program sequencing and control - BUN, BSA, ISZInput/Output Instructions - Input and output - INP, OUT
CONTROL UNIT• Control unit (CU) of a processor translates from machine instructions to the control signals for the
microoperations that implement them
• Control units are implemented in one of two ways• Hardwired Control
– CU is made up of sequential and combinational circuits to generate the control signals• Microprogrammed Control
TIMING AND CONTROLControl unit of Basic Computer Timing and control
Instruction register (IR)15 14 13 12 11 - 0
3 x 8decoder
7 6 5 4 3 2 1 0
ID0
15 14 . . . . 2 1 04 x 16
decoder
4-bitsequence
counter(SC)
Increment (INR)Clear (CLR)
Clock
Other inputs
Controlsignals
D
T
T
7
15
0
CombinationalControl
logic
TIMING SIGNALS
Clock
T0 T1 T2 T3 T4 T0
T0
T1
T2
T3
T4
D3
CLR SC
- Generated by 4-bit sequence counter and 416 decoder- The SC can be incremented or cleared.
- Example: T0, T1, T2, T3, T4, T0, T1, . . . Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.
D3T4: SC 0
Timing and control
– A control memory on the processor contains microprograms that activate the necessary control signals
• We will consider a hardwired implementation of the control unit for the Basic Computer
INSTRUCTION CYCLE• In Basic Computer, a machine instruction is executed in the following cycle:
1. Fetch an instruction from memory2. Decode the instruction3. Read the effective address from memory if the instruction has an indirect address
FETCH and DECODE
• Fetch and Decode T0: AR PC (S0S1S2=010, T0=1)T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)
S2S1S0
Bus
7Memory
unitAddress
Read
AR
LD
PC
INR
IR
LD Clock
1
2
5
Common bus
T1
T0
Instruction Cycle
DETERMINE THE TYPE OF INSTRUCTION
= 0 (direct)
D'7IT3:AR M[AR]D'7I'T3:NothingD7I'T3:Execute a register-reference instr.D7IT3:Execute an input-output instr.
Instrction Cycle
StartSC
AR PCT0
IR M[AR], PC PC + 1T1
AR IR(0-11), I IR(15)Decode Opcode in IR(12-14),
T2
D7= 0 (Memory-reference)(Register or I/O) = 1
II
Executeregister-reference
instructionSC 0
Executeinput-outputinstruction
SC 0
M[AR]AR Nothing
= 0 (register)
(I/O) = 1 (indirect) = 1
T3 T3 T3
T3
Executememory-reference
instructionSC 0
T4
4. Execute the instruction
• After an instruction is executed, the cycle starts again at step 1, for the next instruction
• Note: Every different processor has its own (different) instruction cycle
REGISTER REFERENCE INSTRUCTIONSRegister Reference Instructions are identified whenD7 = 1, I = 0
MEMORY REFERENCE INSTRUCTIONS Memory, PC after execution
21
0 BSA 135
Next instruction
Subroutine
20
PC = 21
AR = 135
136
1 BUN 135
Memory, PC, AR at time T4
0 BSA 135
Next instruction
Subroutine
20
21
135
PC = 136
1 BUN 135
Memory Memory
LDA: Load to ACD2T4:DR M[AR]D2T5:AC DR, SC 0STA: Store ACD3T4:M[AR] AC, SC 0BUN: Branch UnconditionallyD4T4:PC AR, SC 0BSA: Branch and Save Return AddressM[AR] PC, PC AR + 1
- Register Ref. Instr. is specified in b0 ~ b11 of IR- Execution starts with timing signal T3r = D7 I¢T3 => Register Reference InstructionBi = IR(i) , i=0,1,2,...,11r: SC ¬ 0CLA rB11: AC ¬ 0CLE rB10: E ¬ 0CMA rB9: AC ¬ AC’CME rB8: E ¬ E’CIR rB7: AC ¬ shr AC, AC(15) ¬ E, E ¬ AC(0)CIL rB6: AC ¬ shl AC, AC(0) ¬ E, E ¬ AC(15)INC rB5: AC ¬ AC + 1SPA rB4: if (AC(15) = 0) then (PC ¬ PC+1)SNA rB3: if (AC(15) = 1) then (PC ¬ PC+1)SZA rB2: if (AC = 0) then (PC ¬ PC+1)SZE rB1: if (E = 0) then (PC ¬ PC+1)HLT rB0: S ¬ 0 (S is a start-stop flip-flop)- The effective address of the instruction is in AR and was placed there during
timing signal T2 when I = 0, or during timing signal T3 when I = 1- Memory cycle is assumed to be short enough to complete in a CPU cycle- The execution of MR instruction starts with T4AND to AC
D0T4: DR ¬ M[AR] Read operandD0T5: AC ¬ AC Ù DR, SC ¬ 0 AND with AC
ADD to ACD1T4: DR ¬ M[AR] Read operandD1T5: AC ¬ AC + DR, E ¬ Cout, SC ¬ 0 Add to AC and store carry in E
BSA: D5T4: M[AR] ¬ PC, AR ¬ AR + 1D5T5: PC ¬ AR, SC ¬ 0
ISZ: Increment and Skip-if-ZeroD6T4: DR ¬ M[AR]D6T5: DR ¬ DR + 1D6T4: M[AR] ¬ DR, if (DR = 0) then (PC ¬ PC + 1), SC ¬ 0
FLOWCHART FOR MEMORY REFERENCE INSTRUCTIONS
MR InstructionsMemory-reference instruction
DR M[AR] DR M[AR] DR M[AR] M[AR] ACSC 0
AND
ADD
LDA
STA
AC AC DRSC 0
AC AC + DRE CoutSC 0
AC DRSC 0
D T
0 4 D T
1 4 D T
2 4 D T
3 4
D T
0 5 D T
1 5 D T
2 5
PC ARSC 0
M[AR] PCAR AR + 1
DR M[AR]
BUN
BSA
ISZ
D T
4 4 D T
5 4 D T
6 4
DR DR + 1
D T
5 5 D T
6 5
PC ARSC 0
M[AR] DRIf (DR = 0)then (PC PC + 1)SC 0
D T
6 6
INPUT-OUTPUT AND INTERRUPT
Input-Output Configuration
INPRInput register - 8 bitsOUTROutput register - 8 bitsFGIInput flag - 1 bitFGOOutput flag - 1 bitIENInterrupt enable - 1 bit
- The terminal sends and receives serial information- The serial info. from the keyboard is shifted into INPR - The serial info. for the printer is stored in the OUTR- INPR and OUTR communicate with the terminal serially and with the AC in parallel.- The flags are needed to synchronize the timing difference between I/O device and the computer
A Terminal with a keyboard and a Printer
I/O and Interrupt
Input-outputterminal
Serialcommunication
interface
Computerregisters andflip-flops
Printer
Keyboard
Receiverinterface
Transmitterinterface
FGOOUTR
AC
INPR FGI
Serial Communications PathParallel Communications Path
INPUT-OUTPUT INSTRUCTIONS
D7IT3 = pIR(i) = Bi, i = 6, …, 11
p: SC ¬ 0 Clear SCINP pB11: AC(0-7) ¬ INPR, FGI ¬ 0 Input char. to AC OUT pB10: OUTR ¬ AC(0-7), FGO ¬ 0 Output char. from AC SKI pB9: if(FGI = 1) then (PC ¬ PC + 1) Skip on input flag SKO pB8: if(FGO = 1) then (PC ¬ PC + 1) Skip on output flag
PROGRAM CONTROLLED DATA TRANSFER
loop: If FGI = 1 goto loop
INPR new data, FGI 1
loop: If FGO = 1 goto loop consume OUTR, FGO 1
-- CPU -- -- I/O Device --
/* Input */ /* Initially FGI = 0 */ loop: If FGI = 0 goto loop
AC INPR, FGI 0
/* Output */ /* Initially FGO = 1 */ loop: If FGO = 0 goto loop
OUTR AC, FGO 0
I/O and Interrupt
Start Input
FGI 0
FGI=0
AC INPR
MoreCharacter
END
Start Output
FGO 0
FGO=0
MoreCharacter
END
OUTR AC
AC Data
yes
no
yes
no
FGI=0 FGO=1
yes
yesno
no
FLOWCHART FOR INTERRUPT CYCLER = Interrupt f/f
- The interrupt cycle is a HW implementation of a branch and save return address operation.- At the beginning of the next instruction cycle, the instruction that is read from memory is in address1.- At memory address 1, the programmer must store a branch instruction that sends the control to an interrupt service routine- The instruction that returns the control to the original program is "indirect BUN 0"
I/O and Interrupt
Store return address
R =1=0
in location 0M[0] PC
Branch to location 1PC 1
IEN 0 R 0
Interrupt cycleInstruction cycle
Fetch and decodeinstructions
IEN
FGI
FGO
Executeinstructions
R 1
=1
=1
=1
=0
=0
=0
ION pB7: IEN ¬ 1 Interrupt enable onIOF pB6: IEN ¬ 0 Interrupt enable off
COMPLETE COMPUTER DESCRIPTIONFlowchart of Operations
Description
=1 (I/O) =0 (Register) =1(Indir) =0(Dir)
startSC 0, IEN 0, R 0
R
AR PCR’T0
IR M[AR], PC PC + 1R’T1
AR IR(0~11), I IR(15)D0...D7 Decode IR(12 ~ 14)
R’T2
AR 0, TR PCRT0
M[AR] TR, PC 0RT1
PC PC + 1, IEN 0R 0, SC 0
RT2
D7
I I
ExecuteI/O
Instruction
ExecuteRR
Instruction
AR <- M[AR] IdleD7IT3 D7I’T3
D7’IT3 D7’I’T3
Execute MRInstruction
=0(Instruction =1(Interrupt Cycle) Cycle)
=1(Register or I/O) =0(Memory Ref)
D7’T4
Register-Reference
CLA CLE CMA CME CIR CIL INC SPA SNA SZA SZE HLT
Input-Output
INP OUT SKI SKO ION IOF
D7IT3 = rIR(i) = Bi r: rB11: rB10: rB9: rB8: rB7: rB6: rB5: rB4: rB3: rB2: rB1: rB0:
D7IT3 = p IR(i) = Bi p: pB11: pB10: pB9: pB8: pB7: pB6:
(Common to all register-reference instr)(i = 0,1,2, ..., 11)SC 0AC 0E 0AC ACE EAC shr AC, AC(15) E, E AC(0)AC shl AC, AC(0) E, E AC(15)AC AC + 1If(AC(15) =0) then (PC PC + 1)If(AC(15) =1) then (PC PC + 1)If(AC = 0) then (PC PC + 1)If(E=0) then (PC PC + 1)S 0
(Common to all input-output instructions)(i = 6,7,8,9,10,11)SC 0AC(0-7) INPR, FGI 0OUTR AC(0-7), FGO 0If(FGI=1) then (PC PC + 1)If(FGO=1) then (PC PC + 1)IEN 1IEN 0
Description
COMPLETE COMPUTER DESCRIPTION Microoperations
REGISTERS• In Basic Computer, there is only one general purpose register, the Accumulator (AC)• In modern CPUs, there are many general purpose registers• It is advantageous to have many registers
– Transfer between registers within the processor are relatively fast– Going “off the processor” to access memory is much slower
GENERAL REGISTER ORGANIZATIONGeneral Register
Organization
MUXSELA { MUX } SELB
ALUOPR
R1R2R3R4R5R6R7
Input
3 x 8decoder
SELD
Load(7 lines)
Output
A bus B bus
Clock
OPERATION OF CONTROL UNIT
The control unit Directs the information flow through ALU by - Selecting various Components in the system - Selecting the Function of ALU
Example: R1 R2 + R3[1] MUX A selector (SELA): BUS A R2[2] MUX B selector (SELB): BUS B R3[3] ALU operation selector (OPR): ALU to ADD[4] Decoder destination selector (SELD): R1 Out Bus
Control WordEncoding of register selection fields
Control
Binary CodeSELASELBSELD000InputInputNone001 R1 R1 R1010 R2 R2 R2011 R3 R3 R3100 R4 R4 R4101 R5 R5 R5110 R6 R6 R6111 R7 R7 R7
SELA SELB SELD OPR
3 3 3 5
ALU CONTROL
Encoding of ALU operations OPRSelectOperationSymbol00000Transfer ATSFA00001Increment AINCA00010ADD A + BADD00101Subtract A - BSUB00110Decrement ADECA01000AND A and BAND01010OR A and BOR01100XOR A and BXOR01110Complement ACOMA10000Shift right ASHRA11000Shift left ASHLA
Examples of ALU Microoperations Symbolic DesignationMicrooperationSELASELBSELDOPR Control Word
Control
R1 R2 R3 R2 R3 R1 SUB 010 011 001 00101R4 R4 R5 R4 R5 R4 OR 100 101 100 01010R6 R6 + 1 R6 - R6 INCA 110 000 110 00001R7 R1 R1 - R7 TSFA 001 000 111 00000Output R2 R2 - None TSFA 010 000 000 00000Output Input Input - None TSFA 000 000 000 00000R4 shl R4 R4 - R4 SHLA 100 000 100 11000R5 0 R5 R5 R5 XOR 101 101 101 01100
REGISTER STACK ORGANIZATION
Register Stack
Push, Pop operations
/* Initially, SP = 0, EMPTY = 1, FULL = 0 */
PUSH POP
Stack Organization
SP SP + 1 DR M[SP]M[SP] DR SP SP 1If (SP = 0) then (FULL 1) If (SP = 0) then (EMPTY 1)EMPTY 0 FULL 0
Stack - Very useful feature for nested subroutines, nested interrupt services - Also efficient for arithmetic expression evaluation - Storage which can be accessed in LIFO - Pointer: SP - Only PUSH and POP operations are applicable
ABC
01234
63
Address
FULL EMPTY
SP
DR
Flags
Stack pointer
stack
6 bits
MEMORY STACK ORGANIZATION
Stack Organization
- A portion of memory is used as a stack with a processor register as a stack pointer
- PUSH:SP SP - 1 M[SP] DR - POP:DR M[SP] SP SP + 1
Memory with Program, Data, and Stack Segments
40014000399939983997
3000
Data(operands)
Program(instructions)
1000
PC
AR
SPstack
Stack growsIn this direction- Most computers do not provide hardware to check stack overflow (full stack) or underflow (empty stack) must be done in software
REVERSE POLISH NOTATION
A + BInfix notation+ A BPrefix or Polish notationA B +Postfix or reverse Polish notation
- The reverse Polish notation is very suitable for stack manipulation
Evaluation of Arithmetic Expressions Any arithmetic expression can be expressed in parenthesis-free Polish notation, including reverse Polish notation
(3 * 4) + (5 * 6) 3 4 * 5 6 * +
Stack Organization Arithmetic Expressions: A + B
3 3 12 12 12 12 424 5 5
630
3 4 * 5 6 * +
• PROCESSOR ORGANIZATIONIn general, most processors are organized in one of 3 ways
– Single register (Accumulator) organization» Basic Computer is a good example» Accumulator is the only general purpose register
– General register organization» Used by most modern computer processors» Any of the registers can be used as the source or destination for computer operations
– Stack organization» All operations are done using the hardware stack
INSTRUCTION FORMAT
OP-code field - specifies the operation to be performedAddress field - designates memory address(es) or a processor register(s)Mode field - determines how the address field is to be interpreted (to get effective address or the operand)
The number of address fields in the instruction format depends on the internal organization of CPU The three most common CPU organizations:
Instruction Format
Single accumulator organization:
ADDX /* AC AC + M[X] */General register organization:ADDR1, R2, R3 /* R1 R2 + R3 */ ADDR1, R2 /* R1 R1 + R2 */MOVR1, R2 /* R1 R2 */ ADDR1, X /* R1 R1 + M[X] */Stack organization:PUSHX /* TOS M[X] */ ADD
Instruction Fields
Three-Address Instructions
Program to evaluate X = (A + B) * (C + D) :ADDR1, A, B /* R1 M[A] + M[B]*/ ADDR2, C, D /* R2 M[C] + M[D]*/ MULX, R1, R2 /* M[X] R1 * R2*/
- Results in short programs - Instruction becomes long (many bits)
Two-Address Instructions
Program to evaluate X = (A + B) * (C + D) :
MOV R1, A /* R1 M[A] */ADD R1, B /* R1 R1 + M[A] */MOV R2, C /* R2 M[C] */ADD R2, D /* R2 R2 + M[D] */MUL R1, R2 /* R1 R1 * R2 */MOV X, R1 /* M[X] R1 */
Instruction Format THREE, AND TWO-ADDRESS INSTRUCTIONS
» For example, an OR instruction will pop the two top elements from the stack, do a logical OR on them, and push the result on the stack
ONE, AND ZERO-ADDRESS INSTRUCTIONS
- Use an implied AC register for all data manipulation- Program to evaluate X = (A + B) * (C + D) :
Instruction Format
LOAD A /* AC M[A] */ADD B /* AC AC + M[B] */STORE T /* M[T] AC */LOAD C /* AC M[C] */ADD D /* AC AC + M[D]*/MUL T /* AC AC * M[T]*/STORE X /* M[X] AC */
Zero-Address Instructions- Can be found in a stack-organized computer Program to evaluate X = (A + B) * (C + D) :
PUSHA/* TOS A*/PUSHB/* TOS B*/ADD/* TOS (A + B)*/PUSHC/* TOS C*/PUSHD/* TOS D*/ADD/* TOS (C + D)*/MUL/* TOS (C + D) * (A + B) */ POPX/* M[X] TOS*/
ADDRESSING MODES
• Addressing Modes * Specifies a rule for interpreting or modifying the address field of the instruction (before the operand is actually referenced) * Variety of addressing modes - to give programming flexibility to the user - to use the bits in the address field of the instruction efficiently
TYPES OF ADDRESSING MODES
• Implied ModeAddress of the operands are specified implicitly in the definition of the instruction
- No need to specify address in the instruction - EA = AC, or EA = Stack[SP]
- Examples from Basic ComputerCLA, CME, INP
• Immediate ModeInstead of specifying the address of the operand,
operand itself is specified - No need to specify address in the instruction
- However, operand itself needs to be specified - Sometimes, require more bits than the address - Fast to acquire an operand
• Register Mode Address specified in the instruction is the register address - Designated operand need to be in a register - Shorter address than the memory address - Saving address field in the instruction - Faster to acquire an operand than the memory addressing - EA = IR(R) (IR(R): Register field of IR)
• Register Indirect ModeInstruction specifies a register which contains the memory address of the operand
- Saving instruction bits since register address is shorter than the memory address - Slower to acquire an operand than both the register addressing or memory addressing - EA = [IR(R)] ([x]: Content of x)
• Autoincrement or Autodecrement Mode - When the address in the register is used to access memory, the value in the register is incremented or decremented by 1
Automatically
• Direct Address Mode Instruction specifies the memory address which can be used directly to access the memory - Faster than the other memory addressing modes - Too many bits are needed to specify the address for a large physical memory space - EA = IR(addr) (IR(addr): address field of IR)
• Indirect Addressing ModeThe address field of an instruction specifies the address of a memory location that contains the address
of the operand - When the abbreviated address is used large physical memory can be addressed with a relatively small number of bits - Slow to acquire an operand because of an additional memory access - EA = M[IR(address)]
• Relative Addressing Modes The Address fields of an instruction specifies the part of the address (abbreviated address) which can be used along with a designated register to calculate the address of the operand - Address field of the instruction is short - Large physical memory can be accessed with a small number of address bits - EA = f(IR(address), R), R is sometimes implied 3 different Relative Addressing Modes depending on R; * PC Relative Addressing Mode (R = PC) - EA = PC + IR(address) * Indexed Addressing Mode (R = IX, where IX: Index Register)
ADDRESSING MODES - EXAMPLES -
AddressingMode
EffectiveAddress
Contentof AC
Addressing Modes
Direct address500/* AC (500) */ 800Immediate operand -/* AC 500 */ 500Indirect address800/* AC ((500)) */ 300Relative address702/* AC (PC+500) */ 325Indexed address600/* AC (RX+500) */ 900Register -/* AC R1 */ 400Register indirect400 /* AC (R1) */ 700Autoincrement400 /* AC (R1)+ */ 700Autodecrement399 /* AC -(R) */ 450
Load to AC ModeAddress = 500
Next instruction
200201202
399400
450700
500 800
600 900
702 325
800 300
MemoryAddress
PC = 200
R1 = 400
XR = 100
AC
- EA = IX + IR(address) * Base Register Addressing Mode
(R = BAR, where BAR: Base Address Register)- EA = BAR + IR(address
DATA TRANSFER INSTRUCTIONS
Load LDStore STMove MOVExchange XCHInput INOutput OUTPush PUSHPop POP
Name Mnemonic
Typical Data Transfer Instructions
Direct addressLD ADRAC M[ADR]Indirect addressLD @ADRAC M[M[ADR]]Relative addressLD $ADRAC M[PC + ADR]Immediate operandLD #NBRAC NBRIndex addressingLD ADR(X)AC M[ADR + XR]RegisterLD R1AC R1Register indirectLD (R1)AC M[R1]AutoincrementLD (R1)+AC M[R1], R1 R1 + 1Autodecrement LD -(R1) R1 R1 - 1, AC M[R1]
ModeAssemblyConvention Register Transfer
Data Transfer and Manipulation
Data Transfer Instructions with Different Addressing Modes
FLAG, PROCESSOR STATUS WORD
In Basic Computer, the processor had several (status) flags – 1 bit value that indicated various information about the processor’s state – E, FGI, FGO, I, IEN, RIn some processors, flags like these are often combined into a register – the processor status register (PSR); sometimes called a processor status word (PSW)Common flags in PSW areC (Carry): Set to 1 if the carry out of the ALU is 1S (Sign): The MSB bit of the ALU’s outputZ (Zero): Set to 1 if the ALU’s output is all 0’sV (Overflow): Set to 1 if there is an overflow
Status Flag Circuitc7c8
A B8 8
8-bit ALU
V Z S CF7
F7 - F0
8
F
Check forzero output
PROGRAM CONTROL INSTRUCTIONSProgram Control
PC
+1In-Line Sequencing (Next instruction is fetched from the next adjacent location in the memory)
Address from other source; Current Instruction, Stack, etc; Branch, Conditional Branch, Subroutine, etc
Program Control Instructions
Name MnemonicBranch BRJump JMPSkip SKPCall CALLReturn RTNCompare(by ) CMPTest(by AND) TST* CMP and TST instructions do not retain their
results of operations ( and AND, respectively). They only set or clear certain Flags.
CONDITIONAL BRANCH INSTRUCTIONS
BZBranch if zeroZ = 1BNZBranch if not zeroZ = 0BCBranch if carryC = 1BNCBranch if no carryC = 0BPBranch if plusS = 0BMBranch if minusS = 1BVBranch if overflowV = 1BNVBranch if no overflowV = 0
BHIBranch if higherA > BBHEBranch if higher or equalA BBLOBranch if lowerA < BBLOEBranch if lower or equalA BBEBranch if equalA = BBNEBranch if not equalA B
BGTBranch if greater thanA > BBGEBranch if greater or equalA BBLTBranch if less thanA < BBLEBranch if less or equalA BBEBranch if equalA = BBNEBranch if not equalA B
Unsigned compare conditions (A - B)
Signed compare conditions (A - B)
Mnemonic Branch condition Tested condition
Program Control
SUBROUTINE CALL AND RETURNCall subroutineJump to subroutineBranch to subroutineBranch and save return address
Fixed Location in the subroutine (Memory) Fixed Location in memory In a processor Register In memory stack - most efficient way
Program Control
Subroutine Call
Two Most Important Operations are Implied; * Branch to the beginning of the Subroutine - Same as the Branch or Conditional Branch
* Save the Return Address to get the address of the location in the Calling Program upon exit from the Subroutine Locations for storing Return Address CALL
SP SP - 1 M[SP] PC
PC EA
RTN PC M[SP]
SP SP + 1
PROGRAM INTERRUPT
External interrupts External Interrupts initiated from the outside of CPU and Memory - I/O Device → Data transfer request or Data transfer complete - Timing Device → Timeout - Power Failure - Operator
Internal interrupts (traps) Internal Interrupts are caused by the currently running program - Register, Stack Overflow - Divide by zero - OP-code Violation - Protection Violation
Software Interrupts Both External and Internal Interrupts are initiated by the computer HW. Software Interrupts are initiated by the executing an instruction. - Supervisor Call → Switching from a user mode to the supervisor mode → Allows to execute a certain class of operations
which are not allowed in the user mode
INTERRUPT PROCEDUREInterrupt Procedure and Subroutine CallThe interrupt is usually initiated by an internal or an external signal rather than from the execution of an instruction (except for the software interrupt)- The address of the interrupt service program is determined by the hardware rather than from the address field of an instruction- An interrupt procedure usually stores all the information necessary to define the state of CPU rather than storing only the PC.
The state of the CPU is determined from; Content of the PC Content of all processor registers Content of status bits Many ways of saving the CPU state depending on the CPU architectures
COMPLEX INSTRUCTION SET COMPUTER • These computers with many instructions and addressing modes came to be known as Complex
Instruction Set Computers (CISC) • One goal for CISC machines was to have a machine language instruction to match each high-level
language statement type
VARIABLE LENGTH INSTRUCTIONS • The large number of instructions and addressing modes led CISC machines to have variable length
instruction formats• The large number of instructions means a greater number of bits to specify them• In order to manage this large number of opcodes efficiently, they were encoded with different lengths:
– More frequently used instructions were encoded using short opcodes.– Less frequently used ones were assigned longer opcodes.
• Also, multiple operand instructions could specify different addressing modes for each operand– For example,
» Operand 1 could be a directly addressed register,» Operand 2 could be an indirectly addressed memory location,» Operand 3 (the destination) could be an indirectly addressed register.
• All of this led to the need to have different length instructions in different situations, depending on the opcode and operands used
• For example, an instruction that only specifies register operands may only be two bytes in length– One byte to specify the instruction and addressing mode– One byte to specify the source and destination registers.
• An instruction that specifies memory addresses for operands may need five bytes– One byte to specify the instruction and addressing mode– Two bytes to specify each memory address
» Maybe more if there’s a large amount of memory.
• Variable length instructions greatly complicate the fetch and decode problem for a processor• The circuitry to recognize the various instructions and to properly fetch the required number of bytes for
operands is very complex
• Another characteristic of CISC computers is that they have instructions that act directly on memory addresses
– For example, ADD L1, L2, L3
that takes the contents of M[L1] adds it to the contents of M[L2] and stores the result in location M[L3]
• An instruction like this takes three memory access cycles to execute• That makes for a potentially very long instruction execution cycle• The problems with CISC computers are
– The complexity of the design may slow down the processor,– The complexity of the design may result in costly errors in the processor design and
implementation,– Many of the instructions and addressing modes are used rarely, if ever
SUMMARYOF CISC FEATURES → Format, Length, Addressing Modes → Complicated instruction cycle control due to the complex decoding HW and decoding process - Multiple memory cycle instructions → Operations on memory data → Multiple memory accesses/instruction - Microprogrammed control is necessity → Microprogram control storage takes substantial portion of CPU chip area → Semantic Gap is large between machine instruction and microinstruction - General purpose instruction set includes all the features required by individually different applications → When any one application is running, all the features required by the other applications are extra burden to the application
REDUCED INSTRUCTION SET COMPUTERS • In the late ‘70s and early ‘80s there was a reaction to the shortcomings of the CISC style of processors• Reduced Instruction Set Computers (RISC) were proposed as an alternative• The underlying idea behind RISC processors is to simplify the instruction set and reduce instruction
execution time
• RISC processors often feature:– Few instructions– Few addressing modes– Only load and store instructions access memory– All other operations are done using on-processor registers– Fixed length instructions– Single cycle execution of instructions– The control unit is hardwired, not microprogrammed
• Since all but the load and store instructions use only registers for operands, only a few addressing modes are needed
• By having all instructions the same length, reading them in is easy and fast• The fetch and decode stages are simple, looking much more like Mano’s Basic Computer than a CISC
machine• The instruction and address formats are designed to be easy to decode• Unlike the variable length CISC instructions, the opcode and register fields of RISC instructions can be
decoded simultaneously
• The control logic of a RISC processor is designed to be simple and fast• The control logic is simple because of the small number of instructions and the simple addressing modes• The control logic is hardwired, rather than microprogrammed, because hardwired control is faster
UNIT -3
COMPARISON OF CONTROL UNIT IMPLEMENTATIONS
Implementation of Control Unit
Control Unit ImplementationCombinational Logic Circuits (Hard-wired)
Microprogram
I R Status F/Fs
Control Data
CombinationalLogic Circuits
ControlPoints
CPU
Memory
Timing State
Ins. Cycle State
Control Unit's State
Status F/Fs
Control Data
Next AddressGenerationLogic
CSAR
ControlStorage
(-program memory)
Memory
I R
CSDR
CPs
CPUD
}
TERMINOLOGYMicroprogram - Program stored in memory that generates all the control signals required
to execute the instruction set correctly - Consists of microinstructionsMicroinstruction - Contains a control word and a sequencing word Control Word - All the control information required for one clock cycle Sequencing Word - Information needed to decide the next microinstruction address - Vocabulary to write a microprogramControl Memory(Control Storage: CS) - Storage in the microprogrammed control unit to store the microprogram
Writeable Control Memory(Writeable Control Storage:WCS) - CS whose contents can be modified -> Allows the microprogram can be changed -> Instruction set can be changed or modified
Dynamic Microprogramming - Computer system whose control unit is implemented with
a microprogram in WCS - Microprogram can be changed by a systems programmer or a user
Sequencer (Microprogram Sequencer)
MICROINSTRUCTION SEQUENCING
Sequencing Capabilities Required in a Control Storage- Incrementing of the control address register- Unconditional and conditional branches- A mapping process from the bits of the machineinstruction to an address for control memory- A facility for subroutine call and return
SequencingInstruction code
Mapping logi
c
Multiplexers
Control memory (ROM)
Subroutine
register(SBR)
Branchlogic
Status
bits
Microoperations
Control address register (CAR
)
Incrementer
MUXselect
select a statusbi
tBranch address
CONDITIONAL BRANCH Sequencing
Conditional Branch
If Condition is true, then Branch (address from the next address field of the current microinstruction) else Fall Through Conditions to Test: O(overflow), N(negative), Z(zero), C(carry), etc.
Control address register
Control memory
MUX
Load address
Increment
Status(condition)
bits
Micro-operationsCondition select
Next address
...
A Microprogram Control Unit that determines the Microinstruction Address to be executed n the next clock cycle
- In-line Sequencing - Branch - Conditional Branch - Subroutine - Loop - Instruction OP-code mapping
Unconditional Branch Fixing the value of one status bit at the input of the multiplexer to 1
MAPPING OF INSTRUCTIONS Sequencing
ADD RoutineAND RoutineLDA RoutineSTA RoutineBUN Routine
ControlStorage
00000001001000110100
OP-codes of Instructions ADD AND LDA STA BUN
00000001001000110100
.
.
.
Direct Mapping
Address
10 0000 010
10 0001 010
10 0010 010
10 0011 010
10 0100 010
MappingBits 10 xxxx 010
ADD Routine
Address
AND Routine
LDA Routine
STA Routine
BUN Routine
MAPPING OF INSTRUCTIONS TO MICROROUTINES
Mapping function implemented by ROM or PLA
OP-code
Mapping memory(ROM or PLA)
Control address register
Control Memory
Mapping from the OP-code of an instruction to the address of the Microinstruction which is the starting microinstruction of its execution microprogram
1 0 1 1 Address
OP-code
Mapping bits
Microinstruction address
0 x x x x 0 0
0 1 0 1 1 0 0
MachineInstruction
Sequencing
MICROPROGRAM EXAMPLE Microprogram
Computer Configuration
MUX
AR10 0
PC10 0
Address Memory2048 x 16
MUX
DR15 0
Arithmeticlogic andshift unit
AC15 0
SBR6 0
CAR6 0
Control memory128 x 20
Control unit
MACHINE INSTRUCTION FORMAT
Microinstruction Format
Microprogram
EA is the effective address
Symbol OP-code Description
ADD 0000AC AC + M[EA]BRANCH 0001 if (AC < 0) then (PC EA)STORE 0010M[EA] ACEXCHANGE 0011AC M[EA], M[EA] AC
Machine instruction format
I Opcode15 14 11 10
Address
0
Sample machine instructions
F1 F2 F3 CD BR AD
3 3 3 2 2 7
F1, F2, F3: Microoperation fieldsCD: Condition for branching BR: Branch field
AD: Address field
F3Microoperation Symbol000NoneNOP001AC AC DR XOR010AC AC’ COM011AC shl AC SHL100AC shr AC SHR101PC PC + 1 INCPC110PC AR ARTPC111Reserved
MICROINSTRUCTION FIELD DESCRIPTIONS - F1,F2,F3
F1MicrooperationSymbol000NoneNOP001AC AC + DR ADD010AC 0CLRAC011AC AC + 1INCAC100AC DRDRTAC101AR DR(0-10)DRTAR110AR PCPCTAR111M[AR] DRWRITE
Microprogram
F2Microoperation Symbol000NoneNOP001AC AC – DR SUB010AC AC DR OR011AC AC DR AND100DR M[AR] READ101DR AC ACTDR110DR DR + 1 INCDR111DR(0-10) PC PCTDR
MICROINSTRUCTION FIELD DESCRIPTIONS - CD, BR
CDCondition Symbol Comments00Always = 1 U Unconditional branch01DR(15) I Indirect address bit10AC(15) S Sign bit of AC11AC = 0 Z Zero value in AC
BR Symbol Function 00 JMP CAR AD if condition = 1 CAR CAR + 1 if condition = 0 01 CALL CAR AD, SBR CAR + 1 if condition = 1 CAR CAR + 1 if condition = 0 10 RET CAR SBR (Return from subroutine) 11 MAP CAR(2-5) DR(11-14), CAR(0,1,6) 0
Microprogram
SYMBOLIC MICROINSTRUCTIONS
• Symbols are used in microinstructions as in assembly language• A symbolic microprogram can be translated into its binary equivalent by a microprogram
assembler.• Sample Format
five fields: label; micro-ops; CD; BR; AD
Label: may be empty or may specify a symbolic address terminated with a colon Micro-ops: consists of one, two, or three symbols separated by commas
CD: one of {U, I, S, Z}, where U: Unconditional Branch I: Indirect address bit S: Sign of AC Z: Zero value in AC
BR: one of {JMP, CALL, RET, MAP} AD: one of {Symbolic address, NEXT, empty}
SYMBOLIC MICROPROGRAM - FETCH ROUTINE
AR PCDR M[AR], PC PC + 1AR DR(0-10), CAR(2-5) DR(11-14), CAR(0,1,6) 0
Symbolic microprogram for the fetch cycle:
ORG 64PCTAR U JMP NEXT READ, INCPC U JMP NEXT DRTAR U MAP
FETCH:
Binary equivalents translated by an assembler
1000000 110 000 000 00 00 10000011000001 000 100 101 00 00 10000101000010 101 000 000 00 11 0000000
Binaryaddress F1 F2 F3 CD BR AD
Microprogram
During FETCH, Read an instruction from memory and decode the instruction and update PC
Sequence of microoperations in the fetch cycle:
SYMBOLIC MICROPROGRAM
Control Storage: 128 20-bit words The first 64 words: Routines for the 16 machine instructions The last 64 words: Used for other purpose (e.g., fetch routine and other subroutines) Mapping: OP-code XXXX into 0XXXX00, the first address for the 16 routines are 0(0 0000 00), 4(0 0001 00), 8, 12, 16, 20, ..., 60
Microprogram
ORG 0NOPREADADD
ORG 4NOPNOPNOPARTPC
ORG 8NOPACTDRWRITE
ORG 12NOPREADACTDR, DRTACWRITE
ORG 64PCTARREAD, INCPCDRTARREADDRTAR
IUU
SU IU
IUU
IUUU
UUUUU
CALLJMPJMP
JMPJMPCALLJMP
CALLJMPJMP
CALLJMPJMPJMP
JMPJMPMAPJMPRET
INDRCTNEXTFETCH
OVERFETCHINDRCTFETCH
INDRCTNEXTFETCH
INDRCTNEXTNEXTFETCH
NEXTNEXT
NEXT
ADD:
BRANCH:
OVER:
STORE:
EXCHANGE:
FETCH:
INDRCT:
Label Microops CD BR ADPartial Symbolic Microprogram
DESIGN OF CONTROL UNIT - DECODING ALU CONTROL INFORMATION -
Design of Control Unit
microoperation fields
3 x 8 decoder
7 6 5 4 3 2 1 0
F1
3 x 8 decoder
7 6 5 4 3 2 1 0
F2
3 x 8 decoder
7 6 5 4 3 2 1 0
F3
Arithmeticlogic andshift unit
ANDADD
DRTAC
ACLoad
FromPC
FromDR(0-10)
Select 0 1Multiplexers
ARLoad Clock
AC
DR
D R T A RP C T A R
This microprogram can be implemented using ROM
Microprogram
Address Binary MicroinstructionMicro Routine Decimal Binary F1 F2 F3 CD BR ADADD0 0000000000000 000 01 01 1000011 1 0000001 000 100 000 00 00 0000010 2 0000010 001 000 000 00 00 1000000 3 0000011 000 000 000 00 00 1000000 BRANCH 4 0000100 000 000 000 10 00 0000110 5 0000101 000 000 000 00 00 1000000 6 0000110 000 000 000 01 01 1000011 7 0000111 000 000 110 00 00 1000000 STORE 8 0001000 000 000 000 01 01 1000011 9 0001001 000 101 000 00 00 0001010 10 0001010 111 000 000 00 00 1000000 11 0001011 000 000 000 00 00 1000000 EXCHANGE 12 0001100 000 000 000 01 01 1000011 13 0001101 001 000 000 00 00 0001110 14 0001110 100 101 000 00 00 0001111 15 0001111 111 000 000 00 00 1000000
FETCH 64 1000000 110 000 000 00 00 1000001 65 1000001 000 100 101 00 00 1000010 66 1000010 101 000 000 00 11 0000000INDRCT 67 1000011 000 100 000 00 00 1000100 68 1000100 101 000 000 00 10 0000000
BINARY MICROPROGRAM
MICROPROGRAM SEQUENCER- NEXT MICROINSTRUCTION ADDRESS LOGIC -Design of Control Unit
Subroutine CALL
MUX-1 selects an address from one of four sources and routes it into a CAR - In-Line Sequencing CAR + 1 - Branch, Subroutine Call CS(AD) - Return from Subroutine Output of SBR - New Machine instruction MAP
3 2 1 0SS
10
MUX1
External(MAP)
SBRL
Incrementer
CAR
Clock
Address source selection
In-Line
RETURN form Subroutine
Branch, CALL Address
Control Storage
S1S0 Address Source 00 CAR + 1, In-Line 01 SBR RETURN 10 CS(AD), Branch or CALL 11 MAP
MICROPROGRAM SEQUENCER- CONDITION AND BRANCH CONTROL -
Design of Control Unit
InputlogicI
0I1
TMUX2
Select
1I
SZ
Test
CD Field of CS
From CPU BR field
of CS
L(load SBR with PC) for subroutine Call
S0S1
for next addressselection
I0I1T Meaning Source of Address S1S0 L
000 In-Line CAR+1 00 0 001 JMP CS(AD) 10 0 010 In-Line CAR+1 00 0 011 CALL CS(AD) and SBR <- CAR+1 10 1 10x RET SBR 01 0 11x MAP DR(11-14) 11 0
L
S0 = I0S1 = I0I1 + I0’TL = I0’I1T
Input Logic
MICROPROGRAM SEQUENCERDesign of Control Unit
3 2 1 0S1
MUX1
External(MAP)
SBRLoad
Incrementer
CAR
Inputlogic
I0
T
MUX2
Select
1ISZ
Test
Clock
Control memory
Microops CD BR AD
L
I1 S
0
. . .. . .
MICROINSTRUCTION FORMAT
Microinstruction Format Information in a Microinstruction - Control Information - Sequencing Information - Constant Information which is useful when feeding into the system
These information needs to be organized in some way for - Efficient use of the microinstruction bits - Fast decoding
Field Encoding
- Encoding the microinstruction bits - Encoding slows down the execution speed due to the decoding delay - Encoding also reduces the flexibility due to the decoding hardware
HORIZONTAL AND VERTICAL MICROINSTRUCTION FORMAT
Horizontal Microinstructions Each bit directly controls each micro-operation or each control point Horizontal implies a long microinstruction word Advantages: Can control a variety of components operating in parallel. --> Advantage of efficient hardware utilization Disadvantages: Control word bits are not fully utilized --> CS becomes large --> CostlyVertical Microinstructions A microinstruction format that is not horizontal Vertical implies a short microinstruction word Encoded Microinstruction fields --> Needs decoding circuits for one or two levels of decoding
Microinstruction Format
One-level decoding
Field A2 bits
2 x 4Decoder
3 x 8Decoder
Field B3 bits
1 of 4 1 of 8
Two-level decoding
Field A2 bits
2 x 4Decoder
6 x 64Decoder
Field B6 bits
Decoder and selection logic
unit 4
ARITHMETIC AND LOGIC UNIT
ALU Inputs and Outputs
Integer Representation• Only have 0 & 1 to represent everything• Positive numbers stored in binary
— e.g. 41=00101001• No minus sign
• No period• Sign-Magnitude• Two’s compliment
Sign-Magnitude• Left most bit is sign bit• 0 means positive• 1 means negative• +18 = 00010010• -18 = 10010010• Problems
— Need to consider both sign and magnitude in arithmetic— Two representations of zero (+0 and -0)
Two’s Compliment• +3 = 00000011• +2 = 00000010• +1 = 00000001• +0 = 00000000• -1 = 11111111• -2 = 11111110• -3 = 11111101
Benefits• One representation of zero• Arithmetic works easily (see later)• Negating is fairly easy
— 3 = 00000011— Boolean complement gives 11111100— Add 1 to LSB 11111101
Range of Numbers• 8 bit 2s compliment
— +127 = 01111111 = 27 -1— -128 = 10000000 = -27
• 16 bit 2s compliment— +32767 = 011111111 11111111 = 215 - 1— -32768 = 100000000 00000000 = -215
Conversion Between Lengths• Positive number pack with leading zeros• +18 = 00010010• +18 = 00000000 00010010• Negative numbers pack with leading ones• -18 = 10010010• -18 = 11111111 10010010• i.e. pack with MSB (sign bit)
Addition and Subtraction
• Normal binary addition• Monitor sign bit for overflow
• Take twos compliment of substahend and add to minuend— i.e. a - b = a + (-b)
• So we only need addition and complement circuits
Multiplication• Complex• Work out partial product for each digit• Take care with place value (column)• Add partial products
Multiplication Example• 1011 Multiplicand (11 dec)• x 1101 Multiplier (13 dec)• 1011 Partial products• 0000 Note: if multiplier bit is 1 copy• 1011 multiplicand (place value)• 1011 otherwise zero• 10001111 Product (143 dec)• Note: need double length result
Flowchart for Unsigned Binary Multiplication
Multiplying Negative Numbers• This does not work!• Solution 1
— Convert to positive if required
— Multiply as above— If signs were different, negate answer
• Solution 2— Booth’s algorithm
Booth’s Algorithm
Division• More complex than multiplication• Negative numbers are really bad!• Based on long division
Division of Unsigned Binary Integers
Real Numbers
• Numbers with fractions• Could be done in pure binary
— 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625• Where is the binary point?• Fixed?
— Very limited• Moving?
— How do you show where it is?Floating Point
• +/- .significand x 2exponent• Mis Floating Point nomer• Point is actually fixed between sign bit and body of mantissa• Exponent indicates place value (point position)
Signs for Floating Point• Mantissa is stored in 2s compliment• Exponent is in excess or biased notation
— e.g. Excess (bias) 128 means— 8 bit exponent field— Pure value range 0-255— Subtract 128 to get correct value— Range -128 to +127
Normalization• FP numbers are usually normalized• i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1• Since it is always 1 there is no need to store it• (c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point• e.g. 3.123 x 103)
Floating Point Arithmetic +/-• Check for zeros• Align significands (adjusting exponents)• Add or subtract significands• Normalize result
Floating Point Addition & Subtraction Flowchart
FP Arithmetic x/• Check for zero• Add/subtract exponents • Multiply/divide significands (watch sign)• Normalize• Round• All intermediate results should be in double length storage
Floating Point Multiplication
Floating point Division
Unit-5
MEMORY HIERARCHY
Magnetictapes
Magneticdisks
I/Oprocessor
CPU
Mainmemory
Cachememory
Auxiliary memory
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
Memory Hierarchy is to obtain the highest possibleaccess speed while minimizing the total cost of the memory system
Memory Hierarchy
MAIN MEMORY
RAM and ROM Chips
Typical RAM chip
Typical ROM chip
Chip select 1Chip select 2
ReadWrite
7-bit address
CS1CS2RDWRAD 7
128 x 8RAM
8-bit data bus
CS1 CS2 RD WR 0 0 x x 0 1 x x 1 0 0 0 1 0 0 1 1 0 1 x 1 1 x x
Memory function Inhibit Inhibit Inhibit Write Read Inhibit
State of data busHigh-impedenceHigh-impedenceHigh-impedenceInput data to RAMOutput data from RAMHigh-impedence
Chip select 1Chip select 2
9-bit address
CS1CS2
AD 9
512 x 8ROM
8-bit data bus
Main Memory
MEMORY ADDRESS MAP
RAM 1RAM 2RAM 3RAM 4ROM
0000 - 007F0080 - 00FF0100 - 017F0180 - 01FF0200 - 03FF
ComponentHexa
address
0 0 0 x x x x x x x0 0 1 x x x x x x x0 1 0 x x x x x x x0 1 1 x x x x x x x1 x x x x x x x x x
10 9 8 7 6 5 4 3 2 1
Address bus
Memory Connection to CPU
- RAM and ROM chips are connected to a CPU through the data and address buses
- The low-order lines in the address bus select the byte within the chips and other lines in the address bus select a particular chip through its chip select inputs
Address space assignment to each memory chip
Example: 512 bytes RAM and 512 bytes ROM
Main Memory
CONNECTION OF MEMORY TO CPU
Main Memory
}
CS1CS2RDWRAD7
128 x 8RAM 1
CS1CS2RDWRAD7
128 x 8RAM 2
CS1CS2RDWRAD7
128 x 8RAM 3
CS1CS2RDWRAD7
128 x 8RAM 4
Decoder3 2 1 0
WRRD9 8 7-11016-11Address bus
Data bus
CPU
CS1CS2
512 x 8ROMAD9
1- 7
98
D at a
D at a
D at a
D at a
D at a
AUXILIARY MEMORY
Information Organization on Magnetic Tapes
EOFIRG
block 1 block 2 block
3
block 1block
2
block 3
R1
R2 R3 R4
R5 R
6R1R3
R2 R5 R4
file i
EOF
Organization of Disk Hardware
Track
Moving Head Disk Fixed Head Disk
Auxiliary Memory
ASSOCIATIVE MEMORY
- Accessed by the content of the data rather than by an address- Also called Content Addressable Memory (CAM)
Hardware OrganizationArgument register(A)
Key register (K)
Associative memoryarray and logic
m wordsn bits per word
Matchregister
Input
Read
Write
M
- Compare each word in CAM in parallel with the content of A(Argument Register)- If CAM Word[i] = A, M(i) = 1 - Read sequentially accessing CAM for CAM Word(i) for M(i) = 1- K(Key Register) provides a mask for choosing a particular field or key in the argument in A (only those bits in the argument that have 1’s intheir corresponding position of K are compared)
Associative Memory
ORGANIZATION OF CAM
Internal organization of a typical cell Cij
C11
Word 1
Word i
Word m
Bit 1 Bit j Bit n
M1
Mi
Mm
Associative Memory
Aj
R S
Output
Matchlogic
Input
Write
Read
Kj
MiToF ij
A1
Aj
An
K1
Kj
Kn
C1j
C1n
Ci1
Cij
Cin
Cm1
Cmj
Cmn
CACHE MEMORY
Locality of Reference - The references to memory at any given time interval tend to be confined within a localized areas - This area contains a set of information and the membership changes gradually as time goes by - Temporal Locality The information which will be used in near future is likely to be in use already( e.g. Reuse of information in loops) - Spatial Locality If a word is accessed, adjacent(near) words are likely accessed soon (e.g. Related data items (arrays) are usually stored together; instructions are executed sequentially)Cache - The property of Locality of Reference makes the Cache memory systems work - Cache is a fast small capacity memory that should hold those information which are most likely to be accessed
Cache Memory
Main memory
Cache memory
CPU
PERFORMANCE OF CACHE
All the memory accesses are directed first to CacheIf the word is in Cache; Access cache to provide it to CPUIf the word is not in Cache; Bring a block (or a line) including that word to replace a block now in Cache
- How can we know if the word that is required is there ? - If a new block is to replace one of the old blocks, which one should we choose ?
Memory Access
Performance of Cache Memory System
Hit Ratio - % of memory accesses satisfied by Cache memory system Te: Effective memory access time in Cache memory system Tc: Cache access time Tm: Main memory access time
Te = Tc + (1 - h) Tm
Example: Tc = 0.4 s, Tm = 1.2s, h = 0.85% Te = 0.4 + (1 - 0.85) * 1.2 = 0.58s
Cache Memory
MEMORY AND CACHE MAPPING - ASSOCIATIVE MAPPLING -
Associative mappingDirect mappingSet-associative mapping
Associative Mapping
Mapping FunctionSpecification of correspondence between main memory blocks and cache blocks
- Any block location in Cache can store any block in memory -> Most flexible- Mapping Table is implemented in an associative memory -> Fast, very Expensive- Mapping Table Stores both address and the content of the memory word
address (15 bits)Argument register
Address Data
0 1 0 0 00 2 7 7 72 2 2 3 5
3 4 5 06 7 1 01 2 3 4
CAM
Cache Memory
MEMORY AND CACHE MAPPING - DIRECT MAPPING -
Addressing Relationships
Direct Mapping Cache OrganizationMemoryaddress Memory data
00000 1 2 2 0
0077701000
0177702000
02777
2 3 4 03 4 5 0
4 5 6 05 6 7 0
6 7 1 0
Indexaddress Tag Data
000 0 0 1 2 2 0
0 2 6 7 1 0777
Cache memory
Tag(6) Index(9)
32K x 12
Main memoryAddress = 15 bitsData = 12 bits
512 x 12Cache memoryAddress = 9 bits
Data = 12 bits
00 000
77 777
000
777
- Each memory block has only one place to load in Cache- Mapping Table is made of RAM instead of CAM- n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field- n-bit addresses are used to access main memory and k-bit Index is used to access the Cache
Cache Memory
DIRECT MAPPING
Direct Mapping with block size of 8 words
Operation
- CPU generates a memory request with (TAG;INDEX) - Access Cache using INDEX ; (tag; data) Compare TAG and tag - If matches -> Hit Provide Cache[INDEX](data) to CPU - If not match -> Miss M[tag;INDEX] <- Cache[INDEX](data) Cache[INDEX] <- (TAG;M[TAG; INDEX]) CPU <- Cache[INDEX](data)
Index tag data
000 0 1 3 4 5 0007 0 1 6 5 7 8010
017
770 0 2777 0 2 6 7 1
0
Block 0
Block 1
Block 63
Tag Block Word6 6 3
INDEX
Cache Memory
MEMORY AND CACHE MAPPING - SET ASSOCIATIVE MAPPING -
Set Associative Mapping Cache with set size of two
- Each memory block has a set of locations in the Cache to load
Index Tag Data
000
0 1
3 4 5 0
0 2
5 6 7 0
Tag Data
777
0 2
6 7 1 0
0 0
2 3 4 0
Operation - CPU generates a memory address(TAG; INDEX) - Access Cache with INDEX, (Cache word = (tag 0, data 0); (tag 1, data 1)) - Compare TAG and tag 0 and then tag 1 - If tag i = TAG -> Hit, CPU <- data i - If tag i TAG -> Miss, Replace either (tag 0, data 0) or (tag 1, data 1), Assume (tag 0, data 0) is selected for replacement, (Why (tag 0, data 0) instead of (tag 1, data 1) ?) M[tag 0, INDEX] <- Cache[INDEX](data 0) Cache[INDEX](tag 0, data 0) <- (TAG, M[TAG,INDEX]), CPU <- Cache[INDEX](data 0)
Cache Memory
BLOCK REPLACEMENT POLICY
Many different block replacement policies are available
LRU(Least Recently Used) is most easy to implement
Cache word = (tag 0, data 0, U0);(tag 1, data 1, U1), Ui = 0 or 1(binary)
Implementation of LRU in the Set Associative Mapping with set size = 2
Modifications
Initially all U0 = U1 = 1 When Hit to (tag 0, data 0, U0), U1 <- 1(least recently used) (When Hit to (tag 1, data 1, U1), U0 <- 1(least recently used)) When Miss, find the least recently used one(Ui=1) If U0 = 1, and U1 = 0, then replace (tag 0, data 0) M[tag 0, INDEX] <- Cache[INDEX](data 0) Cache[INDEX](tag 0, data 0, U0) <- (TAG,M[TAG,INDEX], 0); U1 <- 1 If U0 = 0, and U1 = 1, then replace (tag 1, data 1) Similar to above; U0 <- 1 If U0 = U1 = 0, this condition does not exist If U0 = U1 = 1, Both of them are candidates, Take arbitrary selection
Cache Memory
CACHE WRITE
Write Through
When writing into memory
If Hit, both Cache and memory is written in parallel If Miss, Memory is written For a read miss, missing block may be overloaded onto a cache block
Memory is always updated -> Important when CPU and DMA I/O are both executing
Slow, due to the memory access time
Write-Back (Copy-Back)
When writing into memory
If Hit, only Cache is written If Miss, missing block is brought to Cache and write into Cache For a read miss, candidate block must be written back to the memory
Memory is not up-to-date, i.e., the same item in Cache and memory may have different value
Cache Memory
VIRTUAL MEMORY
Give the programmer the illusion that the system has a very large memory, even though the computer actually has a relatively small main memory
Address Space(Logical) and Memory Space(Physical)
Address Mapping Memory Mapping Table for Virtual Address -> Physical Address
virtual address(logical address) physical address
address space memory space
address generated by programs actual main memory address
Mapping
Virtual address
Virtualaddressregister
Memorymapping
table
Memory tablebuffer register
Main memoryaddressregister
Mainmemory
Main memorybuffer register
Physical Address
Virtual Memory
ASSOCIATIVE MEMORY PAGE TABLE
Assume that Number of Blocks in memory = m Number of Pages in Virtual Address Space = n
Page Table - Straight forward design -> n entry table in memory Inefficient storage space utilization <- n-m entries of the table is empty
- More efficient method is m-entry Page Table Page Table made of an Associative Memory m words; (Page Number:Block Number)
1 0 1
Line number
Page no.
Argument register
1 0 1 0 00 0 1 1 10 1 0 0 01 0 1 0 11 1 0 1 0
Key register
Associative memory
Page no.Block no.
Virtual address
Page Fault Page number cannot be found in the Page Table
Virtual Memory
1. Trap to the OS2. Save the user registers and program state3. Determine that the interrupt was a page fault4. Check that the page reference was legal and
determine the location of the page on the backing store(disk)
5. Issue a read from the backing store to a free framea. Wait in a queue for this device until servicedb. Wait for the device seek and/or latency timec. Begin the transfer of the page to a free frame
6. While waiting, the CPU may be allocated to some other process
7. Interrupt from the backing store (I/O completed)8. Save the registers and program state for the other user9. Determine that the interrupt was from the backing store10. Correct the page tables (the desired page is now in memory)11. Wait for the CPU to be allocated to this process again12. Restore the user registers, program state, and new page table, then resume the interrupted instruction.
PAGE FAULT
Processor architecture should provide the ability to restart any instruction after a page fault.
LOAD M0
Reference1
OS
trap2
3 Page is on backing store
free frame
main memory
4
bring inmissingpage5
resetpagetable
6
restartinstruction
Virtual Memory
PAGE REPLACEMENT
Modified page fault service routine
Decision on which page to displace to make room foran incoming page when no free frame is available
1. Find the location of the desired page on the backing store2. Find a free frame - If there is a free frame, use it - Otherwise, use a page-replacement algorithm to select a victim frame - Write the victim page to the backing store3. Read the desired page into the (newly) free frame4. Restart the user process
2f 0 v i
f v
framevalid/invalid bit
page table
change toinvalid
4reset pagetable fornew page
victim
1
swapoutvictimpage
3swapdesiredpage in backing store
physical memory
Virtual Memory
PAGE REPLACEMENT ALGORITHMSVirtual Memory
FIFO0
7
1
7
2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1
0 07
1
201
231
230
430
420
423
023
013
012
712
702
701
Page frames
Reference string
FIFO algorithm selects the page that has been in memory the longest time Using a queue - every time a page is loaded, its identification is inserted in the queueEasy to implementMay result in a frequent page fault
-Optimal Replacement (OPT) - Lowest page fault rate of all algorithms
Replace that page which will not be used for the longest period of time
0
7
1
7
2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1
0 07
1
201
20
3
24
3
2
03
2
01
701
Page frames
Reference string
PAGE REPLACEMENT ALGORITHMS
- OPT is difficult to implement since it requires future knowledge - LRU uses the recent past as an approximation of near future.
Replace that page which has not been used for the longest period of time
LRU
0
7
1
7
2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1
0 07
1
201
203
403
402
432
032
132
102
107
Page frames
Reference string
Virtual Memory
- LRU may require substantial hardware assistance- The problem is to determine an order for the framesdefined by the time of last use
Unit-6PERIPHERAL DEVICESInput Devices
• Keyboard• Optical input devices
- Card Reader - Paper Tape Reader - Bar code reader - Digitizer - Optical Mark Reader
• Magnetic Input Devices - Magnetic Stripe Reader
• Screen Input Devices - Touch Screen - Light Pen - Mouse
• Analog Input Devices
Output Devices• Card Puncher, Paper Tape Puncher• CRT• Printer (Impact, Ink Jet,
Laser, Dot Matrix)• Plotter• Analog• Voice
I/O BUS AND INTERFACE MODULES
Each peripheral has an interface module associated with it
Interface- Decodes the device address (device code)- Decodes the commands (operation)- Provides signals for the peripheral controller- Synchronizes the data flow and supervises the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
(Command)
Op. code Device address Function code
Input/Output Interfaces
Processor
Interface
Keyboard and
displayterminal
Magnetictape
Printer
Interface Interface Interface
DataAddressControl
Magneticdisk
I/O bus
CONNECTION OF I/O BUS
Connection of I/O Bus to One Interface
Connection of I/O Bus to CPU
Input/Output Interfaces
I/Obus
Op.code
Deviceaddress
Functioncode
Accumulatorregister
ComputerI/O
control
Sense lines
Data lines
Function code lines
Device address lines
CPU
I/Obus
Device address
Commanddecoder
Function code
Data lines
Buffer register
Peripheralregister
Statusregister
Sense lines
Outputperipheral device
and controller
AD = 1101 InterfaceLogic
I/O BUS AND MEMORY BUS
* MEMORY BUS is for information transfers between CPU and the MM
* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface
* Many computers use a common single bus system for both memory and I/O interface units - Use one common bus but separate control lines for each function - Use one common bus with common control lines for both functions
* Some computer systems use two separate buses, one to communicate with memory and the other with I/O interfaces- Communication between CPU and all interface units is via a commonI/O Bus- An interface connected to a peripheral device may have a number of data registers , a control register, and a status register- A command is passed to the peripheral by sending to the appropriate interface register- Function code and sense lines are not needed (Transfer of data, control, and status information is always via the common I/O Bus)
Functions of Buses
Physical Organizations
I/O Bus
Input/Output Interfaces
ISOLATED vs MEMORY MAPPED I/O
- Separate I/O read/write control lines in addition to memory read/write control lines- Separate (isolated) memory and I/O address spaces - Distinct input and output instructions
Isolated I/O
Memory-mapped I/O
- A single set of read/write control lines (no distinction between memory and I/O transfer)- Memory and I/O addresses share the common address space -> reduces memory address range available- No specific input or output instruction -> The same memory reference instructions can be used for I/O transfers- Considerable flexibility in handling I/O operations
Input/Output Interfaces
I/O INTERFACE
- Information in each port can be assigned a meaning depending on the mode of operation of the I/O device → Port A = Data; Port B = Command; Port C = Status- CPU initializes(loads) each port by transferring a byte to the Control Register → Allows CPU can define the mode of operation of each port → Programmable Port: By changing the bits in the control register, it is possible to change the interface characteristics
CS RS1 RS0 Register selected 0 x x None - data bus in high-impedence 1 0 0 Port A register 1 0 1 Port B register 1 1 0 Control register 1 1 1 Status register
Programmable Interface
Input/Output Interfaces
Chip select
Register select
Register select
I/O read
I/O write
CS
RS1
RS0
RD
WR
Timingand
Control
Busbuffers
Bidirectionaldata bus
Port Aregister
Port Bregister
Controlregister
Statusregister
I/O data
I/O data
Control
Status
Inte
rnal
bus
CPU I/ODevice
ASYNCHRONOUS DATA TRANSFER
Synchronous - All devices derive the timing information from common clock lineAsynchronous - No common clock
Asynchronous data transfer between two independent units requires that control signals be transmitted between the communicating units to indicate the time at which data is being transmitted
Strobe pulse - A strobe pulse is supplied by one unit to indicate the other unit when the transfer has to occur
Handshaking - A control signal is accompanied with each data being transmitted to indicate the presence of data - The receiving unit responds with another control signal to acknowledge receipt of the data
Synchronous and Asynchronous Operations
Asynchronous Data Transfer
Two Asynchronous Data Transfer Methods
Asynchronous Data Transfer
STROBE CONTROLAsynchronous Data Transfer
* Employs a single control line to time each transfer* The strobe may be activated by either the source or the destination unit
Sourceunit
Destinationunit
Data bus
Strobe
Data
Strobe
Valid data
Block Diagram
Timing Diagram
Source-Initiated Strobe for Data Transfer
Sourceunit
Destinationunit
Data bus
Strobe
Data
Strobe
Valid data
Block Diagram
Destination-Initiated Strobe for Data Transfer
Timing Diagram
HANDSHAKINGStrobe Methods Source-Initiated
The source unit that initiates the transfer has no way of knowing whether the destination unit has actually received data
Destination-Initiated The destination unit that initiates the transfer no way of knowing whether the source has actually placed the data on the bus
To solve this problem, the HANDSHAKE method introduces a second control signal to provide a Replyto the unit that initiates the transfer
SOURCE-INITIATED TRANSFER USING HANDSHAKE
* Allows arbitrary delays from one state to the next * Permits each unit to respond at its own data transfer rate * The rate of transfer is determined by the slower unit
Block Diagram
Timing Diagram
Accept data from bus.Enable data accepted
Disable data accepted.Ready to accept data(initial state).
Sequence of EventsPlace data on bus.Enable data valid.
Source unit Destination unit
Disable data valid.Invalidate data on bus.
Sourceunit
Destinationunit
Data bus
Data accepted
Data bus
Data valid
Valid data
Data valid
Data accepted
Asynchronous Data Transfer
ASYNCHRONOUS SERIAL TRANSFERAsynchronous serial transferSynchronous serial transferAsynchronous parallel transferSynchronous parallel transfer
- Employs special bits which are inserted at both ends of the character code - Each character consists of three parts; Start bit; Data bits; Stop bits.
A character can be detected by the receiver from the knowledge of 4 rules; - When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected by a Start Bit , which is always a 0- The character bits always follow the Start Bit- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1 bit time
The receiver knows in advance the transfer rate of the bits and the number of information bits to expect
Four Different Types of Transfer
Asynchronous Serial Transfer
Start bit(1 bit)
StopbitsCharacter bits
1 1 0 0 0 1 0 1
(at least 1 bit)
Asynchronous Data Transfer
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER - UART -
A typical asynchronous communication interface available as an IC
Transmitter Register - Accepts a data byte(from CPU) through the data bus - Transferred to a shift register for serial transmission Receiver - Receives serial information into another shift register - Complete data byte is sent to the receiver registerStatus Register Bits - Used for I/O flags and for recording errorsControl Register Bits - Define baud rate, no. of bits in each character, whether to generate and check parity, and no. of stop bits
Chip select
Register select
I/O read
I/O write
CS
RS
RD
WR
Timing
andControl
Busbuffers
Bidirectionaldata bus
Transmitterregister
Controlregister
Statusregister
Receiverregister
Shiftregister
Transmittercontrol
and clock
Receivercontrol
and clock
Shiftregister
Transmitdata
Transmitterclock
Receiverclock
Receivedata
Asynchronous Data Transfer
CS RS Oper. Register selected
0 x x None 1 0 WR Transmitter register 1 1 WR Control register 1 0 RD Receiver register 1 1 RD Status register
Int
er na l B us
FIRST-IN-FIRST-OUT(FIFO) BUFFER* Input data and output data at two different rates * Output data are always in the same order in which the data entered the buffer.* Useful in some applications when data is transferred asynchronously
4 x 4 FIFO Buffer (4 4-bit registers Ri), 4 Control Registers(flip-flops Fi, associated with each Ri)
Asynchronous Data Transfer
4-bitregister
S
R
F
F'
1
1
4-bitregister
S
R
F
F'
2
2
4-bitregister
S
R
F
F'
3
3
4-bitregister
S
R
F
F'
4
4
F
F
S
R
F
F'
S
R
Clock Clock Clock Clock
Dataoutput
Outputready
Delete
Datainput
Insert
Input ready
Master clear
R1 R2
R3
R4
MODES OF TRANSFER - PROGRAM-CONTROLLED I/O -
3 different Data Transfer Modes between the central computer(CPU or Memory) and peripherals; Program-Controlled I/O
Interrupt-Initiated I/O Direct Memory Access (DMA)
Program-Controlled I/O(Input Dev to CPU)
Modes of Transfer
Polling or Status Checking
Continuous CPU involvement CPU slowed down to I/O speed Simple Least hardware
Read status registerCheck flag bit
flag
Read data registerTransfer data to memory
Operationcomplete?
Continue withprogram
= 0
= 1
yes
no
CPU
Data bus
Address bus
I/O read
I/O write
Interface
Data register
Statusregister F
I/O bus
Data valid
Data accepted
I/Odevice
MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA
DMA (Direct Memory Access)
- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums, disks, tapes, etc.- DMA controller Interface that provides I/O transfer of data directly to and from the memory and the I/O device- CPU initializes the DMA controller by sending a memory address and the number of words to be transferred- Actual transfer of data is done directly between the device and memory through DMA controller -> Freeing CPU for other tasks
- Polling takes valuable CPU time- Open communication only when some data has to be passed -> Interrupt.- I/O interface, instead of the CPU, monitors the I/O device- When the interface determines that the I/O device is ready for data transfer, it generates an Interrupt Request to the CPU - Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the service routine to process the data transfer, and then returns to the task it was performing
Interrupt Initiated I/O
Modes of Transfer
PRIORITY INTERRUPT
Priority Interrupt by Software(Polling) - Priority is established by the order of polling the devices(interrupt sources) - Flexible since it is established by software - Low cost since it needs a very little hardware - Very slow
Priority Interrupt by Hardware - Require a priority interrupt manager which accepts all the interrupt requests to determine the highest priority request - Fast since identification of the highest priority interrupt request is identified by the hardware - Fast since each interrupt source has its own interrupt vector to access directly to its own service routine
Priority - Determines which interrupt is to be served first when two or more requests are made simultaneously - Also determines which interrupts are permitted to interrupt the computer while another is being serviced - Higher priority interrupts can make requests while servicing a lower priority interrupt
Priority Interrupt
HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN -
One stage of the daisy chain priority arrangement
PI RF PO Enable 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1
Interrupt Request from any device(>=1) -> CPU responds by INTACK <- 1 -> Any device receives signal(INTACK) 1 at PI puts the VAD on the bus Among interrupt requesting devices the only device which is physically closest to CPU gets INTACK=1, and it blocks INTACK to propagate to the next device
Priority Interrupt
Device 1PI PO
Device 2PI PO
Device 3PI PO
INT
INTACK
Interrupt request
Interrupt acknowledge
To nextdevice
CPU
VAD 1 VAD 2 VAD 3Processor data bus
* Serial hardware priority function* Interrupt Request Line - Single common line* Interrupt Acknowledge Line - Daisy-Chain
S
R
QInterruptrequest
from device
PI
Priority in
RF
Delay
Vector address
VAD
PO
Priority out
Interrupt request to CPU
Enable
PARALLEL PRIORITY INTERRUPT
IEN: Set or Clear by instructions ION or IOFIST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load VAD generated by the Priority Logic
Interrupt Register: - Each bit is associated with an Interrupt Request from different Interrupt Source - different priority level - Each bit can be cleared by a program instructionMask Register: - Mask Register is associated with Interrupt Register - Each bit can be set or cleared by an Instruction
Priority Interrupt
Maskregister
INTACKfrom CPU
Priorityencoder
I0
I1
I 2
I 3
0
1
2
3
y
x
ISTIEN0
1
2
3
0
0
0
0
0
0
Disk
Printer
Reader
Keyboard
Interrupt register
Enable
Interruptto CPU
VADto CPU
BusBuffer
INTERRUPT PRIORITY ENCODERDetermines the highest priority interrupt when more than one interrupts take place
Priority Encoder Truth table
1 d d d0 1 d d0 0 1 d0 0 0 10 0 0 0
I0
I1
I2
I3 0 0 1
0 1 11 0 11 1 1d d 0
x y IST
x = I0' I1'y = I0' I1 + I0’ I2’(IST) = I0 + I1 + I2 +
I3
Inputs Outputs
Boolean functions
Priority Interrupt
INTERRUPT SERVICE ROUTINE
Initial and Final OperationsEach interrupt service routine must have an initial and final set of operations for controlling the registers in the hardware interrupt system
Initial Sequence [1] Clear lower level Mask reg. bits [2] IST <- 0 [3] Save contents of CPU registers [4] IEN <- 1 [5] Go to Interrupt Service Routine
Final Sequence [1] IEN <- 0 [2] Restore CPU registers [3] Clear the bit in the Interrupt Reg [4] Set lower level Mask reg. bits [5] Restore return address, IEN <- 1
Priority Interrupt
address Memory
JMP PTR
JMP RDR
JMP KBD
JMP DISK0
1
2
3
I/O service programs
Program to servicemagnetic disk
Program to serviceline printer
Program to servicecharacter reader
Program to servicekeyboard
DISK
PTR
RDR
KBD
255256
750
256750
Stack
Main program
current instr.749KBDinterrupt
2
VAD=00000011 3
4
Diskinterrupt
5
6
7
8
9 10
11
1
INTERRUPT CYCLEAt the end of each Instruction cycle - CPU checks IEN and IST - If IEN · IST = 1, CPU -> Interrupt Cycle
SP ¬SP - 1 Decrement stack pointerM[SP] ¬ PC Push PC into stackINTACK ¬ 1 Enable interrupt acknowledgePC ¬ VAD Transfer vector address to PCIEN ¬ 0 Disable further interruptsGo To Fetch to execute the first instruction in the interrupt service routine
DIRECT MEMORY ACCESS
High-impedence(disabled)
when BG isenabled
CPU bus signals for DMA transfer
Block diagram of DMA controller
* Block of data transfer from high speed devices, Drum, Disk, Tape* DMA controller - Interface which allows I/O transfer directly between memory and Device, freeing CPU for other tasks* CPU initializes DMA Controller by sending memory address and the block size(number of words)
Address bus
Data bus
Read
Write
ABUS
DBUS
RDWR
Bus request
Bus granted
BR
BGCPU
Address bus
Data bus
DMA select
Register select
Read
Write
Bus request
Bus grant
Interrupt
DS
RS
RD
WR
BR
BG
Interrupt
Data busbuffers
Address busbuffers
Address register
Word count register
Control register
DMA request
DMA acknowledge to I/O device
Controllogic
Direct Memory Access
Int
er na l B us
DMA I/O OPERATIONStarting an I/O - CPU executes instruction to Load Memory Address Register Load Word Counter Load Function(Read or Write) to be performed Issue a GO command
Upon receiving a GO Command DMA performs I/O operation as follows independently from CPU
Input [1] Input Device <- R (Read control signal) [2] Buffer(DMA Controller) <- Input Byte; and assembles the byte into a word until word is full [4] M <- memory address, W(Write control signal) [5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC - 1 [6] If WC = 0, then Interrupt to acknowledge done, else go to [1]
Output [1] M <- M Address, R M Address R <- M Address R + 1, WC <- WC - 1 [2] Disassemble the word [3] Buffer <- One byte; Output Device <- W, for all disassembled bytes [4] If WC = 0, then Interrupt to acknowledge done, else go to [1]
Direct Memory Access
CYCLE STEALING
While DMA I/O takes place, CPU is also executing instructions
DMA Controller and CPU both access Memory -> Memory Access Conflict
Memory Bus Controller
- Coordinating the activities of all devices requesting memory access - Priority System
Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA Controller -> Cycle Stealing
Cycle Steal
- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory cycles - DMA Controller steals the memory cycles from CPU - For those stolen cycles, CPU remains idle - For those slow CPU, DMA Controller may steal most of the memory cycles which may cause CPU remain idle long time
Direct Memory Access
DMA TRANSFER
BG
BRCPU
RD WR Addr Data
InterruptRandom-accessmemory unit (RAM)
RD WR Addr Data
BR
BG
RD WR Addr Data
Interrupt
DS
RS DMAController
I/OPeripheral
deviceDMA request
DMA ack.
Read control
Write control
Data bus
Address bus
Addressselect
Direct Memory Access
INPUT/OUTPUT PROCESSOR - CHANNEL -Channel
- Processor with direct memory access capability that communicates with I/O devices - Channel accesses memory by cycle stealing - Channel can execute a Channel Program - Stored in the main memory - Consists of Channel Command Word(CCW) - Each CCW specifies the parameters needed by the channel to control the I/O devices and perform data transfer operations - CPU initiates the channel by executing an channel I/O class instruction and once initiated, channel operates independently of the CPU
Input/Output Processor
PD PD PD PD
Peripheral devices
I/O bus
Input-outputprocessor
(IOP)
Centralprocessingunit (CPU)
Memory unit
Memor
y Bus
CHANNEL / CPU COMMUNICATION
Send instructionto test IOP.path
If status OK, then sendstart I/O instruction
to IOP.
CPU continues withanother program
Transfer status wordto memory
Access memoryfor IOP program
Conduct I/O transfersusing DMA;
Prepare status report.
I/O transfer completed;Interrupt CPU
Request IOP status
Transfer status wordto memory locationCheck status word
for correct transfer.
Continue
CPU operations IOP operations
Input/Output Processor
PIPELINING AND VECTOR PROCESSING
Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors
- unit 7
PARALLEL PROCESSINGParallel processing will denote the simultaneous occuirrence of data processing tasks for the purpose of increasing the computational speed of a computer systemor
PARALLEL COMPUTERS
Architectural Classification
Number of Data Streams
Number ofInstructionStreams
Single
Multiple
Single Multiple
SISD SIMD
MISD MIMD
Parallel Processing
Flynn's classificationBased on the multiplicity of Instruction Streams and Data StreamsInstruction StreamSequence of Instructions read from memoryData StreamOperations performed on the data in the processor
Execution of Concurrent Events in the computing process to achieve faster Computational Speed
Levels of Parallel Processing Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
COMPUTER ARCHITECTURES FOR PARALLEL PROCESSING
Von-Neuman based
Dataflow
Reduction
SISD
MISD
SIMD
MIMD
Superscalar processors
Superpipelined processors
VLIW
Nonexistence
Array processors
Systolic arrays
Associative processors
Shared-memory multiprocessors
Bus based Crossbar switch based Multistage IN based
Message-passing multicomputers
Hypercube Mesh Reconfigurable
SIMD COMPUTER SYSTEMS
Control Unit
Memory
Alignment network
P P P• • •
M MM • • •
Data bus
Instruction stream
Data stream
Processor units
Memory modules
Characteristics - Only one copy of the program exists - A single controller executes one instruction at a time
TYPES OF SIMD COMPUTERS
Array Processors - The control unit broadcasts instructions to all PEs,and all active PEs execute the same instructions - ILLIAC IV, GF-11, Connection Machine, DAP, MPP
Systolic Arrays
- Regular arrangement of a large number of very simple processors constructed on VLSI circuits - CMU Warp, Purdue CHiP
Associative Processors
- Content addressing - Data transformation operations over many sets of arguments with a single instruction - STARAN, PEPE
PIPELINING
R1 Ai, R2 Bi Load Ai and BiR3 R1 * R2, R4 Ci Multiply and load CiR5 R3 + R4 Add
A technique of decomposing a sequential process into suboperations, with each subprocess being executed in a partial dedicated segment that operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai
R1 R2
Multiplier
R3 R4
Adder
R5
MemoryBi
Ci
Segment 1
Segment 2
Segment 3
ClockPulse
Segment 1 Segment 2 Segment 3
Number R1 R2 R3 R4 R5 1 A1 B1 2 A2 B2 A1 * B1 C1 3 A3 B3 A2 * B2 C2 A1 * B1 + C1 4 A4 B4 A3 * B3 C3 A2 * B2 + C2 5 A5 B5 A4 * B4 C4 A3 * B3 + C3 6 A6 B6 A5 * B5 C5 A4 * B4 + C4 7 A7 B7 A6 * B6 C6 A5 * B5 + C5 8 A7 * B7 C7 A6 * B6 + C6 9 A7 * B7 + C7
OPERATIONS IN EACH PIPELINE STAGE
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
S R1 1 S R2 2 S R3 3 S R4 4Input
Clock
Space-Time Diagram1 2 3 4 5 6 7 8 9
T1
T1
T1
T1
T2
T2
T2
T2
T3
T3
T3
T3 T4
T4
T4
T4 T5
T5
T5
T5 T6
T6
T6
T6Clock cycles
Segment 1
2
3
4
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed
Conventional Machine (Non-Pipelined)tn: Clock cycle : Time required to complete the n tasks = n * tn
Pipelined Machine (k stages)tp: Clock cycle (time to complete each suboperation): Time required to complete the n tasks = (k + n - 1) * tp
SpeedupSk: Speedup
Sk = n*tn / (k + n - 1)*tp
n Sk =
tntp
( = k, if tn = k * tp )
lim
Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
P1
I i
P2
I i+1
P3
I i+2
P4
I i+3
Multiple Functional Units
Example - 4-stage pipeline - subopertion in each stage; tp = 20nS - 100 tasks to be executed - 1 task in non-pipelined system; 20*4 = 80nS Pipelined System (k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System n*k*tp = 100 * 80 = 8000nS
Speedup Sk = 8000 / 2060 = 3.88
4-Stage Pipeline is basically identical to the system with 4 identical function units
ARITHMETIC PIPELINEFloating-point adder
[1] Compare the exponents[2] Align the mantissa[3] Add/sub the mantissa[4] Normalize the result
X = A x 2aY = B x 2b
R
Compareexponents
by subtraction
a b
R
Choose exponent
Exponents
R
A B
Align mantissa
Mantissas
Difference
R
Add or subtractmantissas
R
Normalizeresult
R
R
Adjustexponent
R
Segment 1:
Segment 2:
Segment 3:
Segment 4:
Arithmetic Pipeline
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle[1] Fetch an instruction from memory[2] Decode the instruction[3] Calculate the effective address of the operand[4] Fetch the operands from memory[5] Execute the operation[6] Store the result in the proper place
* Some instructions skip some phases* Effective address calculation can be done in the part of the decoding phase* Storage of the operation result into a register is done automatically in the execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory[2] DA: Decode the instruction and calculate the effective address of the operand[3] FO: Fetch the operand[4] EX: Execute the operation
Instruction Pipeline
INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline
Instruction Pipeline
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
Conventional
Pipelined
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
1 2 3 4 5 6 7 8 9 10
12
13
11F
IDA
FO
EX
1
FI
DA
FO
EXF
IDA
FO
EX F
IDA
FO
EXF
IDA
FO
EXF
IDA
FO
EXF
IDA
FO
EX
2
3
4
5
6
7
FI
Step:Instructi
on
(Branch)
Instruction Pipeline
Fetch instructionfrom memory
Decode instructionand calculate
effective address
Branch?
Fetch operandfrom memory
Execute instruction
Interrupt?Interrupthandling
Update PC
Empty pipe
no
yes
yesno
Segment1:
Segment2:
Segment3:
Segment4:
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts) Hardware Resources required by the instructions in simultaneous overlapped execution cannot be metData hazards (Data Dependency Conflicts) An instruction scheduled to be executed in the pipeline requires the result of a previous instruction, which is not yet availableR1 <- B + CR1 <- R1 + 1
Hardware Technique Interlock
- hardware detects the data dependencies and delays the scheduling of the dependent instruction by stalling enough clock cycles Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source (usually an ALU) to a user, bypassing a designated register. This allows the value to be produced to be used at an earlier stage in the pipeline than would otherwise be possible Software Technique Instruction Scheduling(compiler) for delayed loadControl hazards
DATA HAZARDS
Data Hazards
Occurs when the execution of an instruction depends on the results of a previous instructionADD R1, R2, R3SUB R4, R1, R5
Data hazard can be dealt with either hardware techniques or software technique
Instruction Pipeline
CONTROL HAZARDS
Branch Instructions
- Branch target address is not known until the branch instruction is completed
- Stall -> waste of cycle times
FI DA FO EX
FI DA FO EX
BranchInstruction
NextInstruction
Target address available
Dealing with Control Hazards
* Prefetch Target Instruction * Branch Target Buffer * Loop Buffer * Branch Prediction * Delayed Branch
Instruction Pipeline
CONTROL HAZARDSInstruction Pipeline
Prefetch Target InstructionFetch instructions in both streams, branch not taken and branch takenBoth are saved until branch branch is executed. Then, select the right instruction stream and discard the wrong streamBranch Target Buffer(BTB; Associative Memory)Entry: Addr of previously executed branches; Target instruction and the next few instructionsWhen fetching an instruction, search BTB.If found, fetch the instruction stream in BTB; If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file) Storage of entire loop that allows to execute a loop without accessing memoryBranch PredictionGuessing the branch condition, and fetch an instruction stream based on the guess. Correct guess eliminates the branch penaltyDelayed BranchCompiler detects the branch and rearranges the instruction sequence by inserting useful instructions that keep the pipeline busy in the presence of a branch instruction
Branches and other instructions that change the PC make the fetch of the next instruction to be delayed
RISC PIPELINE
Instruction Cycles of Three-Stage Instruction Pipeline
RISC Pipeline
RISC - Machine with a very fast clock cycle that executes at the rate of one instruction per cycle <- Simple Instruction Set Fixed Length Instruction Format Register-to-Register Operations
Data Manipulation Instructions I: Instruction Fetch A: Decode, Read Registers, ALU Operations E: Write a Register
Load and Store Instructions I: Instruction Fetch A: Decode, Evaluate Effective Address E: Register-to-Memory or Memory-to-Register Program Control Instructions I: Instruction Fetch A: Decode, Evaluate Branch Address E: Write Register(PC)
DELAYED LOAD
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle 1 2 3 4 5 6 Load R1 I A E Load R2 I A E Add R1+R2 I A E Store R3 I A E
Pipeline timing with delayed load
clock cycle 1 2 3 4 5 6 7 Load R1 I A E Load R2 I A E NOP I A E Add R1+R2 I A E Store R3 I A E
LOAD: R1 M[address 1] LOAD: R2 M[address 2] ADD: R3 R1 + R2 STORE: M[address 3] R3
RISC Pipeline
The data dependency is takencare by the compiler rather than the hardware
DELAYED BRANCH
1I
3 4 652Clock cycles:
1. Load A
2. Increment
4. Subtract
5. Branch to X
7
3. Add
8
6. NOP
E
I A E
I A E
I A E
I A E
I A E
9 10
7. NOP
8. Instr. in X
I A E
I A E
1
I
3 4 652Clock cycles:
1. Load A
2. Increment
4. Add
5. Subtract
7
3. Branch to X
8
6. Instr. in X
E
I A E
I A E
I A E
I A E
I A E
Compiler analyzes the instructions before and after the branch and rearranges the program sequence by inserting useful instructions in the delay stepsUsing no-operation instructions
Rearranging the instructions
RISC Pipeline
VECTOR INSTRUCTIONS
f1: V Vf2: V Sf3: V x V Vf4: V x S V
V: Vector operandS: Scalar operand
TypeMnemonic Description (I = 1, ..., n)
Vector Processing
f1 VSQRVector square root B(I) SQR(A(I)) VSINVector sine B(I) sin(A(I)) VCOMVector complement A(I) A(I) f2 VSUMVector summation S A(I) VMAXVector maximum S max{A(I)} f3 VADDVector add C(I) A(I) + B(I) VMPYVector multiply C(I) A(I) * B(I) VANDVector AND C(I) A(I) . B(I) VLARVector larger C(I) max(A(I),B(I)) VTGEVector test > C(I) 0 if A(I) < B(I) C(I) 1 if A(I) > B(I) f4 SADDVector-scalar add B(I) S + A(I) SDIVVector-scalar divide B(I) A(I) / S
VECTOR INSTRUCTION FORMAT
Operation code
Base address source 1
Base address source 2
Base address destination
Vector length
Vector Processing
Vector Instruction Format
Source A
Source B
Multiplier pipeline
Adder pipeline
Pipeline for Inner Product
MULTIPLE MEMORY MODULE AND INTERLEAVINGVector Processing
Multiple Module Memory
Address Interleaving Different sets of addresses are assigned to different memory modules
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
Address bus
Data bus
M0 M1 M2 M3
UNIT - 8
MULTIPROCESSORS :
A Multiprocessor System is an interconnection of two or more CPU’s with memory & Input –output equipment.Parallel Computing
Simultaneous use of multiple processors, all componentsof a single architecture, to solve a task. Typically processors identical,single user (even if machine multiuser)
Distributed Computing
Use of a network of processors, each capable of beingviewed as a computer in its own right, to solve a problem. Processors may be heterogeneous, multiuser, usually individual task is assigned to a single processors
Pipelining Breaking a task into steps performed by different units, and multiple inputs stream through the units, with next input starting in a unit when previous input done with the unit but not necessarily done with the task
Vector Computing Use of vector processors, where operation such as multiplybroken into several steps, and is applied to a stream of operands(“vectors”). Most common special case of pipelining
Systolic Similar to pipelining, but units are not necessarily arranged linearly,
steps are typically small and more numerous, performed in lockstepfashion. Often used in special-purpose hardware such as image or signal processors
Types Of Multiprocessors:
Tightly Coupled System - Tasks and/or processors communicate in a highly synchronized fashion - Communicates through a common shared memory - Shared memory system
Loosely Coupled System - Tasks or processors do not communicate in a synchronized fashion - Communicates by message passing packets - Overhead for data exchange is high - Distributed memory system
INTERCONNECTION STRUCTURES
* Time-Shared Common Bus* Multiport Memory* Crossbar Switch* Multistage Switching Network* Hypercube System Bus All processors (and memory) are connected to a common bus or busses - Memory access is fairly uniform, but not very scalable
BusA collection of signal lines that carry module-to-module communication- Data highways connecting several digital system elements
Operations of Bus
Devices
M3 S7 M6 S5 M4S2
Devices
M3 S7 M6 S5 M4S2
Bus
M3 wishes to communicate with S5
[1] M3 sends signals (address) on the bus that causes S5 to respond
[2] M3 sends data to S5 or S5 sends data to M3(determined by the command line)
Master Device: Device that initiates and controls the communicationSlave Device: Responding deviceMultiple-master buses
> Bus conflict -> need bus arbitration
SYSTEM BUS STRUCTURE FOR MULTIPROCESSORS
Common
SharedMemory
SystemBus
Controller
CPU
IOP
LocalMemor
y
SystemBus
Controller
CPU
LocalMemor
y
SystemBus
Controller
CPU
IOP
LocalMemor
y
Local Bus
SYSTEM BUS
Local Bus
Local Bus
MULTIPORT MEMORYMultiport Memory Module - Each port serves a CPUMemory Module Control Logic - Each memory module has control logic - Resolve memory module conflicts Fixed priority among CPUsAdvantages - Multiple paths -> high transfer rateDisadvantages - Memory control logic - Large number of cables and
connections
CROSSBAR SWITCHMM4
MM 1 MM 2 MM 3 MM 4
CPU 1
CPU 2
CPU 3
CPU 4
MM1
CPU1
CPU2
CPU3
CPU4
MM2 MM3
MemoryModule
data
address
R/W
memoryenable
}
}
}
data,address, andcontrol from CPU 1
data,address, andcontrol from CPU 2
data,address, andcontrol from CPU 3
data,address, andcontrol from CPU 4
Multiplexersand
arbitrationlogic
A
B
0
1
A connected to 0
A
B
0
1
A connected to 1
A
B
0
1
B connected to 0
A
B
0
1
B connected to 1
Block Diagram of Crossbar Switch
MULTISTAGE SWITCHING NETWORK
Interstage Switch
MULTISTAGE INTERCONNECTION NETWORK
0
1000
001
0
1010
011
0
1100
101
0
1110
111
0
1
0
1
0
1
P1
P2
8x8 Omega Switching Network
01
23
45
67
000001
010011
100101
110111
Binary Tree with 2 x 2 Switches
HYPERCUBE INTERCONNECTION
- p = 2n- processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to the n neighboring nodes- Degree = n
One-cube Two-cube Three-cube
11 010
1 00 10
010
110
011 111
101
100
001
000
n-dimensional hypercube (binary n-cube)
Binary Tree with 2 x 2 Switches
INTERPROCESSOR ARBITRATION
Bus Board level bus Backplane level bus Interface level bus
System Bus - A Backplane level bus
- Printed Circuit Board - Connects CPU, IOP, and Memory - Each of CPU, IOP, and Memory board can be plugged into a slot in the backplane(system bus) - Bus signals are grouped into 3 groups
Data, Address, and Control(plus power)
- Only one of CPU, IOP, and Memory can be granted to use the bus at a time - Arbitration mechanism is needed to handle multiple requests
e.g. IEEE standard 796 bus - 86 linesData: 16(multiple of 8)Address: 24Control: 26Power: 20
SYNCHRONOUS & ASYNCHRONOUS DATA TRANSFER
Synchronous Bus Each data item is transferred over a time slice known to both source and destination unit - Common clock source - Or separate clock and synchronization signal is transmitted periodically to synchronize the clocks in the system
Asynchronous Bus * Each data item is transferred by Handshake mechanism - Unit that transmits the data transmits a control signal that indicates the presence of data - Unit that receiving the data responds with another control signal to acknowledge the receipt of the data * Strobe pulse - supplied by one of the units to indicate to the other unit when the data transfer has to occur
BUS SIGNALS
IEEE Standard 796 Multibus Signals (Cont’d)
Miscellaneous controlMaster clock CCLKSystem initializationINITByte high enable BHENMemory inhibit (2 lines)INH1 - INH2Bus lock LOCKBus arbitrationBus request BREQCommon bus requestCBRQBus busy BUSYBus clock BCLKBus priority in BPRNBus priority out BPROPower and ground (20 lines)
INTERPROCESSOR ARBITRATION STATIC ARBITRATION
Serial Arbitration Procedure
Parallel Arbitration Procedure
Interprocessor Arbitration
Busarbiter 1
PI PO
Busarbiter 2
PI PO Busarbiter 3
PI PO Busarbiter 4
PI PO
Highestpriority
1
Bus busy line
To nextarbiter
Busarbiter 1
Ack Req
Busarbiter 2
Ack Req
Busarbiter 3
Ack Req
Busarbiter 4
Ack Req
Bus busy line
4 x 2Priority encoder
2 x 4Decoder
INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION
Priorities of the units can be dynamically changeable while the system is in operationTime Slice Fixed length time slice is given sequentially to ach processor, round-robin fashion
Polling Unit address polling - Bus controller advances the address to identify the requesting unitLRUFIFORotating Daisy Chain Conventional Daisy Chain - Highest priority to the nearest unit to the bus controller Rotating Daisy Chain - Highest priority to the unit that is nearest to the unit that has most recently accessed the bus(it becomes the bus controller)
INTERPROCESSOR SYNCHRONIZATION
Synchronization Communication of control information between processors - To enforce the correct sequence of processes - To ensure mutually exclusive access to shared writable data Hardware Implementation Mutual Exclusion with a Semaphore Mutual Exclusion - One processor to exclude or lock out access to shared resource by other processors when it is in a Critical Section - Critical Section is a program sequence that, once begun, must complete execution before another processor accesses the same shared resource Semaphore - A binary variable - 1: A processor is executing a critical section,that not available to other processors 0: Available to any requesting processor - Software controlled Flag that is stored in memory that all processors can be access
SEMAPHORE
Testing and Setting the Semaphore - Avoid two or more processors test or set the same semaphore - May cause two or more processors enter the same critical section at the same time - Must be implemented with an indivisible operation
R <- M[SEM] / Test semaphore / M[SEM] <- 1 / Set semaphore / These are being done while locked, so that other processors cannot test and set while current processor is being executing these instructions If R=1, another processor is executing thec critical section, the processor executed this instruction does not access the shared memory If R=0, available for access, set the semaphore to 1 and access The last instruction in the program must clear the semaphore
CACHE COHERENCECache Coherence
Caches are Coherent
Cache Incoherency in Write Through Policy
Cache Incoherency in Write Back Policy
X = 120
X = 120
P1
X = 52
P2
X = 52
P3
Main memory
Caches
Processors
Bus
X = 52
X = 120
P1
X = 52
P2
X = 52
P3
Main memory
Caches
Processors
Bus
X = 52
X = 52
P1
X = 52
P2
X = 52
P3
Main memory
Caches
Processors
Bus
MAINTAINING CACHE COHERENCY
Shared Cache - Disallow private cache - Access time delay
Software Approaches * Read-Only Data are Cacheable - Private Cache is for Read-Only data - Shared Writable Data are not cacheable - Compiler tags data as cacheable and noncacheable - Degrade performance due to software overhead
* Centralized Global Table - Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write) - All caches can have copies of RO blocks - Only one cache can have a copy of RW block
Hardware Approaches * Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs - All caches attached to the bus monitor the write operations - When a word in a cache is written, memory is also updated (write through) - Local snoopy controllers in all other caches check their memory to determine if they have a copy of that word; If they have, that location is marked invalid(future reference to this location causes cache miss)