William Stallings Computer Organization and Architecture 8th
William Stallings Computer Organization and Architecture
-
Upload
shaine-graham -
Category
Documents
-
view
32 -
download
2
description
Transcript of William Stallings Computer Organization and Architecture
![Page 1: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/1.jpg)
William Stallings William Stallings Computer Organization Computer Organization and Architectureand Architecture
Chapter 4Chapter 4Internal MemoryInternal Memory
![Page 2: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/2.jpg)
♦Computer memory is organized into a hierarchy.
♦Decreasing cost/bit, increasing capacity, slower access time, and decreasing frequency of access of the memory by the processor
♦The cache automatically retains a copy of some of the recently used words from the DRAM.
The four-level memory hierarchyThe four-level memory hierarchy
![Page 3: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/3.jpg)
Memory Hierarchy
Registers In CPU
Internal or Main memory May include one or more levels of cache “RAM”
External memory Backing store
![Page 4: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/4.jpg)
4.1 COMPUTER MEMORY SYSTEM OVERVIEW
Characteristics of Memory SystemsCharacteristics of Memory Systems LocationLocation CapacityCapacity Unit of transferUnit of transfer Access methodAccess method PerformancePerformance Physical typePhysical type Physical characteristicsPhysical characteristics OrganisationOrganisation
![Page 5: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/5.jpg)
LocationLocation
The term location refers to whether memory is internal or external to the computer.
CPU The processor requires its own local memory , in
the form of registersregisters.
Internal Main memory, cacheMain memory, cache
External Peripheral storage devices, such as disk and tape
![Page 6: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/6.jpg)
CapacityCapacity
Internal memory capacity typically expressed in terms of bytesbytes(1byte=8bits)or wordswords.
External memory capacity expressed in bytes. WordWord
The natural unit of organisation Word length usually 8, 16 and 32 bits The size of the wordThe size of the word is typically equal to the
number of bits used to represent a number and to the instruction length. Unfortunately, there are many exceptions.
![Page 7: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/7.jpg)
Unit of TransferUnit of Transfer
Internal Usually governed by data bus widthdata bus width
External Usually a blockblock which is much larger than a word
Addressable unit Smallest location which can be uniquely addressed At the word level or byte level In any case,
22AA=N, A is the length in bits of an address =N, A is the length in bits of an address
N is the number of addressable N is the number of addressable unitsunits
![Page 8: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/8.jpg)
Access Methods (1)Access Methods (1) SequentialSequential access access
Start at the beginning and read through in order Access time depends on location of data and previous
locationvariablevariable
e.g. tapee.g. tape
DirectDirect access access Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location
variablevariable
e.g. diske.g. disk
![Page 9: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/9.jpg)
Access Methods (2)Access Methods (2)
RandomRandom Individual addresses identify locations exactly Access time is independent of location or previous
access and is constantconstant e.g. RAMe.g. RAM
AssociativeAssociative Data is located by a comparison with contentscontents of a
portion of the store Access time is independent of location or previous
access and is constant e.g. cachee.g. cache
![Page 10: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/10.jpg)
PerformancePerformance Parameters Parameters
Access timeAccess time For random-access memoryFor random-access memory
the time it takes to perform a read or write operation.the time it takes to perform a read or write operation. Time between presenting the address Time between presenting the address to the memory to the memory and and
getting the valid datagetting the valid data For non-random-access memoryFor non-random-access memory
The time it takes to position the read-write mechanism at the The time it takes to position the read-write mechanism at the desired location.desired location.
Memory Cycle timeMemory Cycle time Cycle time is accessaccess time plus time plus additional timeadditional time Time may be required for the memory to “recover” before
next access
Transfer RateTransfer Rate Rate at which data can be moved
![Page 11: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/11.jpg)
Physical TypesPhysical Types
SemiconductorSemiconductor RAM
MagneticMagnetic Disk & Tape
OpticalOptical CD (Compact Disk) & DVD (Digital Video Disk)
Others Bubble Hologram
![Page 12: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/12.jpg)
Physical CharacteristicsPhysical Characteristics
DecayDecay VolatilityVolatility
In a volatile memory, information decays naturally or is lost when electrical power is switched off.
In a nonvolatile memory, no electrical power is needed to retain information, e.g. magnetic-surface memory.
ErasableErasable Power consumptionPower consumption
![Page 13: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/13.jpg)
OrganisationOrganisation
Organisation means pphysical hysical arrangement of bits into wordsarrangement of bits into words
Obvious arrangement not always used
![Page 14: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/14.jpg)
Memory Hierarchy
Registers In CPU
Internal or Main memory May include one or more levels
of cache “RAM”
External memory Backing store
![Page 15: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/15.jpg)
The Bottom Line
The design constraints on a computer’s memory:
How much? Capacity
How fast? Time is money
How expensive?
A trade-ff among the three key characteristics of memory: cost, capacity, and access time.
![Page 16: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/16.jpg)
Hierarchy List
RegistersL1 CacheL2 CacheMain memoryDisk cacheDiskOpticalTape
![Page 17: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/17.jpg)
Hierarchy List
Across this spectrum of technologies: Faster access time, greater cost per
bit Greater capacity, smaller cost per
bit Greater capacity, slower access
time
From top to down: Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of
the memory by the processor
![Page 18: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/18.jpg)
So you want fast?
It is possible to build a computer which uses only static RAM (see later)
This would be very fastThis would need no cache
How can you cache cache?
This would cost a very large amount
![Page 19: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/19.jpg)
Locality of Reference
During the course of the execution of a program, memory references tend to clustercluster e.g. loops and subroutines
Main memory is usually extended with a higher-speed, smaller cachecache. It is a device for stagingstaging the movement of data between main memory and processor registers to improve performance.
External memoryExternal memory, called Secondary or auxiliary Secondary or auxiliary memorymemory are used to store program and data files and visible to the programmer only in terms of files and records.
![Page 20: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/20.jpg)
4.2 Semiconductor Main Memory
Table 4.2 Semiconductor Memory TypesTable 4.2 Semiconductor Memory Types
![Page 21: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/21.jpg)
Types of Random-Access Semiconductor Memory
RAM RAM Misnamed as all semiconductor memory is
random accessrandom access, because all of the types listed in the table are random access.
Read/WriteRead/Write VolatileVolatile
A RAM must be provided with a constant power supply.
Temporary storage Static or dynamic
![Page 22: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/22.jpg)
Dynamic RAM (DRAM)
Bits stored as chargecharge in capacitorsCharges leakNeed refreshing even when poweredSimpler constructionSmaller per bitLess expensiveNeed refresh circuitsrefresh circuitsSlower Main memoryMain memory
![Page 23: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/23.jpg)
Static RAM (SRAM)
Bits stored as on/off switchesswitchesNo charges to leakNo refreshing needed when poweredMore complex constructionLarger per bitMore expensiveDoes not need refresh circuitsFaster CacheCache
![Page 24: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/24.jpg)
Read Only Memory (ROMROM)
Permanent storageApplications
Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables
![Page 25: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/25.jpg)
Types of ROMWritten during manufacture
Very expensive for small runs
Programmable (once) PROM Needs special equipment to program
Read “mostly” Erasable Programmable (EPROM)
Erased by UV
Electrically Erasable (EEPROM)Takes much longer to write than read
Flash memoryIt is intermediate between EPROM and EEPROM in both cost and functionality.Erase whole memory electrically or erase blocks of memory
![Page 26: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/26.jpg)
OrganisationOrganisation in detail
Memory cell The basic element of a semiconductor memory Two stable states being written into to set the state, or being read to sense the state
Chip LogicChip Logic One extreme organization : the physical arrangement of
cells in the array is the same as the logical arrangement. The array is organized into W words of B bits each.The array is organized into W words of B bits each. e.g. A 16Mbit chip can be organised as 1M 16-bit words
One-bit-per-chipOne-bit-per-chip in which data is read/written one bit at a time A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each
word in chip 1 and so on
![Page 27: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/27.jpg)
Typical organization of a 16-Mbit DRAM A 16Mbit chip can be organised as a 2048 x 2048
x 4bit array Reduces number of address pins
Multiplex row address and column address11 pins to address (211=2048)An additional 11 address lines select one of 2048
columns of 4bits per column. Four data lines are for the input and output of 4 bits to and from a data buffer. On write, the bit driver of each bit line is activated for a 1 or 0 according to the value of the corresponding data line. On read, the value of each bit line selects which row of cells is used for reading or writing.
Adding one more pin devoted to addressing doubles the number of rows and columns, and so the size of the chip memory grows by a factor 4.
Chip LogicChip Logic
![Page 28: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/28.jpg)
Typical 16 Mb DRAM (4M x 4)
![Page 29: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/29.jpg)
Refreshing
Refresh circuit included on chipDisable chipCount through rowsrowsRead & Write backTakes timeSlows down apparent performance
![Page 30: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/30.jpg)
Chip Packaging
EPROM packageEPROM package , which is a one-word-per-chip, 8-Mbit chip organized as 1M×8
•The address of the word being accessed . For 1M words, a total of 20 pins (220=1M) are needed.
•D0~D7
•The power supply to the chip (VCC)
•A ground pin (Vss)
•A chip enable (CE) pin: the CE pin is used to indicate whether or not the address is valid for this chip.
•A program voltage (Vpp)
![Page 31: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/31.jpg)
DRAM packageDRAM package, 16-Mbit chip organized as 4M×4
RAM chip can be updated, the data pins are input/output different from ROM chip
•Write Enable pin (WE)
•Output Enable pin (OE)
•Row Address Select (RAS)
•Column Address Select (CAS)
![Page 32: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/32.jpg)
Module Organisation
·If a RAM chip contain only 1bit per word1bit per word, clearly a a number of chips equal to the number of chips equal to the number of bits per wordsnumber of bits per words are needed.
e.g. How a memory module e.g. How a memory module consisting of 256K 8-bit consisting of 256K 8-bit words could be organized?words could be organized?
256K=218, an 18-bit address needed;
The address is presented to
8 256K×1-bit chips, each of which provides the input/output of 1 bit. Figure 4.6 256kbyte memory Organization
![Page 33: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/33.jpg)
Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
![Page 34: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/34.jpg)
(1M×8bit/256K×8bit)=4=22
As show in figure 4.7, 1M word by 8bits per word is organized as four columns of chips, each column containing 256K words arranged as in Figure 4.6.
1M=220
For 1M word, 20 address lines are needed.The 18 least significant bits are routed to all 32 modules.The high-order 2 bits are input to a group select logic module that sends a chip enable chip enable signalsignal to one of the four columns of modules.
![Page 35: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/35.jpg)
Error Correction
Hard Failure Permanent defect
Soft Error Random, non-destructive No permanent damage to memory
Detected using Hamming error correcting Hamming error correcting codecode
![Page 36: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/36.jpg)
Error Correcting Code Function
•A function f, is performed on the data to produce a code.
•When the previously stored word is read out, the code is used to detect and possible correct errors.
•A new set of K code bits is generated from the M data bits and compared with the fetched code bits.
![Page 37: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/37.jpg)
Figure 4.9 Hamming Error-Correcting CodeFigure 4.9 Hamming Error-Correcting Code
Even Parity bits
Figure 4.9 uses Venn diagramsVenn diagrams to illustrate the use of Hamming code on 4-bit words (M=4). With three intersection circles, there are seven compartments. We assign the 4 data bits to the inner compartments. The remaining compartments are filled with parity parity bitsbits. Each parity bit is chosen so that the total number of 1s in its circle is eveneven.
![Page 38: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/38.jpg)
The comparison logic receives as input two k-bit values. A bit-by-bit comparison is done by taking the exclusive-or exclusive-or of the two inputs. The results is called the syndrome wordsyndrome word.. The syndrome word is therefore K bits wide and has a range
between 0 and 2K-1. The value 0 indicates that no error was The value 0 indicates that no error was detected. Leaving 2detected. Leaving 2KK-1 values to indicate, if there is an error, -1 values to indicate, if there is an error, which bit was in error which bit was in error (the numerical value of the syndrome (the numerical value of the syndrome indicates the position of the data bit in error).indicates the position of the data bit in error).
An error could occur on any of the M data bitsM data bits or K check bits K check bits so, 22KK-1-1≥≥M+KM+K
(This equation gives the number of bits needed to correct a single bit error in a word containing M data bits.)
Figure 4.8 Error-Correcting Code
![Page 39: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/39.jpg)
Figure 4.10 Layout of Data bits and Check bits
Those bit positions whose position number are powers of 2powers of 2 are designated as check bits.
Each check bit operates on every data bit position whose position number contains a 1 in the corresponding column position.
Bit position n is checked by those bits Ci such that ∑i=n.
C8 C4 C2 C1C8 C4 C2 C1
![Page 40: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/40.jpg)
The check bits are calculated as follows, where the symbol designates the exclusive-or operation:
Assume that the 8-bit input words is 0011100100111001, with data bit M1 in the right-most position. The calculations are as follows:
Suppose the data bit 3 sustains an error and is changed from 0 to 1.
![Page 41: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/41.jpg)
When the new check bits are compared with the old check bits, the syndrome word is formed:
The result is 0110, indicating that bit position 6, which contains data bit 3, in error.
![Page 42: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/42.jpg)
Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) codea single-error-correction (SEC) code
![Page 43: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/43.jpg)
More commonly, semiconductor memory is equipped with a a single-error-correcting double-error-detecting (SEC-DED) codesingle-error-correcting double-error-detecting (SEC-DED) code. An error-correction code enhances the reliability of the memory at the cost of added complexity.
Table 4.3 Increase in Word Length with Error CorrectionTable 4.3 Increase in Word Length with Error Correction
![Page 44: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/44.jpg)
11
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occurtwo errors occur (Figure 4.12 c), the checking procedure goes astray (d) and worsens the problem by creating a third error (e). To overcome the problem, an eighth bit is added that is set so that the total number of 1s in the diagram is even.
![Page 45: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/45.jpg)
4.3 CASHE MEMORY
Small amount of fast memorySits between normal main memory and
CPUMay be located on CPU chip or module
![Page 46: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/46.jpg)
Cache operation - overview Figure 4.14 Cache/Main-Memory Structure (P118)Figure 4.14 Cache/Main-Memory Structure (P118) Cache includes tags to identify which block of main memory is in each
cache slot. The tag is usually a portion of the main memory address.
TagTag BlockBlockLine Line NumberNumber
00
11
22
C-1C-1
Block length (k words)Block length (k words)(a) Cache(a) Cache
![Page 47: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/47.jpg)
MemoryMemory
addressaddress 0
1
2
3
2n-1
Block (K words)
Block
Word LengthWord Length(b) Main Memory(b) Main Memory
![Page 48: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/48.jpg)
Figure 4.15 Cache Read Operation (P119)Figure 4.15 Cache Read Operation (P119)• CPU requests contents of memory location• Check cache for this data• If present, get from cache (fast)• If not present, read required block from main memory to cache• Then deliver from cache to CPU
![Page 49: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/49.jpg)
Typical Cache Organization
Figure 4.16 Typical Cache Organization
•In this organization, the cache connects to the processor via data, control, and address linesdata, control, and address lines.
•The data and address lines attach to data and address data and address buffersbuffers, which attach to a system bussystem bus from which main memory is reached.
•When a cache hita cache hit occurs, the data and address buffers are disabled and communication is only between processor and cache, with no system bus no system bus traffictraffic
•When a cache misscache miss occurs, the desired address is loaded onto the system bus and the data are returned through a data buffer a data buffer to both the cache and main to both the cache and main memory.memory.
![Page 50: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/50.jpg)
Elements of Cache Design
SizeSize Mapping FunctionMapping Function
Direct Associative Set Associative
Replacement AlgorithmReplacement Algorithm Least recently used (LRU) First in first out (FIFO) Least frequently used (LFU) Random
Write PolicyWrite Policy Write through Write back Write once
Block SizeBlock Size Number of CachesNumber of Caches
Single or two level Unified or split
![Page 51: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/51.jpg)
Cache Cache SizeSize
A trade-off between cost per bit and access time
Cost More cache is expensive
Speed More cache is faster (up to a point) Checking cache for data takes time
“Optimum” cache sizes Suggested : between 1K and 512K words.
![Page 52: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/52.jpg)
Mapping FunctionMapping Function
Three techniques direct, associative, and set associativedirect, associative, and set associative
Elements of the example Cache of 64kByte Cache block of 4 bytes
Data is transferred between memory and the cache in blocksblocks of 4 bytes each.
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory24 bit address (224=16M)Main memory (4M blocks of 4 bytes each)
![Page 53: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/53.jpg)
Direct Mapping
Each block of main memory maps to only Each block of main memory maps to only one cache lineone cache line i.e. if a block is in cache, it must be in one specific
place
Address is in two partsLeast Significant ww bits identify unique word or
byte within a block of main memory.Most Significant ss bits specify one memory blockThe MSBs are split into a cache line field r r and a
tag of s-rs-r (most significant)The line field of rr identifies one of the m=2r lines
of the cache
![Page 54: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/54.jpg)
Direct Mapping Cache Line Table
Cache lineCache line Main Main Memory blocks Memory blocks assignedassigned
0 0, m, 2m, … 2s-m
1 1, m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1, 3m-1… 2s-1
The mapping is expressed as: i= j modulo mi= j modulo mwhere i =cache line number j = main memory block number m = number of lines in the cache
Every row has the same cache Every row has the same cache line number; Every column has line number; Every column has
the same tag number.the same tag number.
No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field!!
![Page 55: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/55.jpg)
Direct Mapping Cache Organization
The r-bit r-bit line number is used as an index into the cache to access a particular lineline.
If the (s-r) bit(s-r) bit tag number matches the tag numbertag number currently stored in that line, then the w-bitw-bit word number is used to select one of the 2w bytes in that line.
Otherwise, the s s bits tag-plus-line field is used to fetch a block from main memory.
![Page 56: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/56.jpg)
Direct MappingAddress Structure
Tag s-rs-r Line or Slot rr Word ww
8 14 2
24 bit address w =2 bit word identifier (4 byte block) s=22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tagfinding line and checking Tag
![Page 57: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/57.jpg)
Direct Mapping Example
Main Memory AddressMain Memory Address
• The cache is organized as 16K=214 lines of 4 bytes each.
• The main memory consists of 16Mbytes, organized as 4M blocks of 4 bytes each.
• i= j modulo m
i = cache line number
j = main memory block number
m = number of lines in the cache
• Note that no two blocks that map into the same line numberline number have the same tag numbertag number.
![Page 58: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/58.jpg)
Direct Mapping pros & cons
Advantages Simple Inexpensive
Disadvantages Fixed location for given block
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
![Page 59: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/59.jpg)
Associative Mapping
A main memory block can load into any A main memory block can load into any line of cacheline of cache
Memory address is interpreted as a a tag and tag and a a wordword field. field.
Tag Tag uniquely identifies block of memoryEvery line’s tag is examined for a matchDisadvantages of associative mapping
Cache searching gets expensive Complex circuitry required to examine the tags of
all caches in parallelin parallel.
![Page 60: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/60.jpg)
Fully Associative Cache Organization
![Page 61: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/61.jpg)
Tag 22 bitWord2 bit
Associative MappingAddress Structure
22 bit tag stored with each 32 bit (4B) block of dataCompare tag field with tag entry in cache to check
for hitLeast significant 2 bits of address identify which 16-
bit word is required from 32 bit data blocke.g.
Address Tag Data Cache line
16339C 058CE7 FEDCBA98 0001
![Page 62: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/62.jpg)
Associative Mapping Example
Main Memory Address
![Page 63: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/63.jpg)
Set Associative Mapping
Cache is divided into a number of setssetsEach set contains a number of lineslinesA given block maps to any line in a any line in a
given setgiven set e.g. Block B can be in any line of set i
e.g. 2 lines per set 2 way associative mapping2 way associative mapping A given block can be in one of 2 lines in only
one set
![Page 64: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/64.jpg)
Set Associative Mapping
In this case , the cache is divided into vv sets, each of which consists of k k lines.
The relationships are m = v × km = v × k
i = j modulo vi = j modulo vwhere
i=cache set number j=main memory block number
m=number of lines in the cache
This is referred to as k-way set associative k-way set associative mappingmapping.
![Page 65: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/65.jpg)
Two Way Set Associative Cache Organization
The dd set bits set bits specify one of v=2 v=2dd setssets. The s s bits of the tag and set fields specify one of the 2one of the 2ss blocks blocks of main memory.
With K-wayK-way set associative mapping, the tagthe tag in a memory address is much smaller and is only compared to the k tags compared to the k tags within a single setwithin a single set.
![Page 66: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/66.jpg)
Set Associative MappingExample
13 bit set numberBlock number in main memory is modulo
213 000000, 00A000, 00B000, 00C000 … map
to same set
![Page 67: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/67.jpg)
Set Associative MappingAddress Structure
Use set field to determine cache setcache set to look in Tag+Set field Tag+Set field specifies one of the blocks in the
main memory.Compare tag fieldtag field to see if we have a hite.g
Address Tag Data Set number
1FF 7FFC 1FF 24682468 1FFF
Tag 9 bit Set 13 bitWord2 bit
![Page 68: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/68.jpg)
Two Way Set Associative Mapping Example
e.gAddress Tag Data Set number1FF 7FFC 1FF 24682468 1FFF
02C 0004 02C 11235813 0001
Main Memory AddressMain Memory Address
![Page 69: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/69.jpg)
Replacement Algorithms (1)Direct mappingDirect mapping
When a new block is brought into the cache, one of the existing blocks must be replaced.
Direct mappingDirect mappingNo choiceEach block only maps to one lineReplace that line
![Page 70: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/70.jpg)
Replacement Algorithms (2)Associative & Set AssociativeAssociative & Set Associative
Hardware implemented algorithmHardware implemented algorithm (speed)Least Recently used (LRU)
Replace that block in the set which has been in the cache longest with no reference to it. (hit ratio + time)
e.g. in 2 way set associativeWhich of the 2 block is LRU?
First in first out (FIFO) replace block in the set that has been in cache longest. (time)
Least frequently used replace block in the set which has had fewest hits. (hit ratio)
Random
![Page 71: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/71.jpg)
Write Policy
Must not overwrite a cache block unless main memory is up to date
Problems to contend with More than one device may have access to main
memory.Data inconsistent between memory and cache
Multiple CPUs may have individual cachesData inconsistent among caches
Write PolicyWrite Policy Write through Write back Write once
![Page 72: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/72.jpg)
Write throughWrite through
All writes go to main memory as well as cache
Any other processor-cache can monitor main memory traffic to keep local (to CPU) cache updated.
Disadvantages Lots of traffic Slows down writes
![Page 73: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/73.jpg)
Write backWrite back
Updates initially made in cache only Update bitUpdate bit for cache slot is set when update
occurs If block in cache is to be replaced, write If block in cache is to be replaced, write
to main memory only if update bit is setto main memory only if update bit is setOther caches get out of syncI/O must access main memory through cache
Because portions of main memory are invalid
![Page 74: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/74.jpg)
Approaches to cache coherency
Bus watching with write through Each cache controller monitors the address lines to detect
write operations to memory by other bus masters. This strategy depends on the use of a write-through
policy by all cache controller. Hardware transparency
Additional hardware is used to ensure that all the updates to main memory via cache are reflected in all caches.
Noncachable memory Only a portion of main memory is shared by more than
one processor. In such a system, all accesses to shared memory are
cache misses, because the shared memory is never copied to the cache.
The noncachable memory can be identified using chip-select logic or high-access bits.
![Page 75: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/75.jpg)
Line Size
The principle of locality Data in the vicinity of a referenced word is likely
to be referenced in the near future.
The relationship between block size and hit The relationship between block size and hit ratioratio is complex, depending on the locality characteristics of a particular program, and no definitive optimum value has been found.
A size of from two to eight wordstwo to eight words seems reasonably close to optimum.
![Page 76: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/76.jpg)
Number of caches
A single cacheMultiple caches
The number of levels of caches The use of unified versus split caches
Split cachesSplit caches: one dedicated to instructions and one dedicated to data
• Key advantage of split caches: eliminate contention for cache between the instruction processor and the execution unit.
Unified cacheUnified cache: a single cache used to store references to both data and instructions
For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically.
![Page 77: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/77.jpg)
Number of caches
The on-chip cache: cache and processor on the same chip When the requested instruction or data is found in the on-chip
cache, the bus access is eliminated. Because of the short data paths internal to the processor, on-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycles.
Advantages Reduce the processor’s external bus activity Speed up execution times Increase overall system performance
A two-level cache The internal cache designated as level 1 (L1) The external cache designated as level 2 (L2)
![Page 78: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/78.jpg)
4.4 Pentium Cache
Foreground readingFind out detail of Pentium II cache
systemsNOT just from Stallings!
![Page 79: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/79.jpg)
4.5 Newer RAM Technology (1)
Basic DRAM same since first RAM chips ConstraintsConstraints of the traditional DRAM chip: its internal architecture and its interface to the
processor’s memory bus.Enhanced DRAM
Contains small SRAM as well SRAM holds last line read A comparator stores the 11-bit value of the most
recent row address selection.Cache DRAM (CDRAM)
Larger SRAM component Use as cache or serial buffer
![Page 80: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/80.jpg)
Newer RAM Technology (2)
Synchronous DRAM (SDRAM) Access is synchronized with an external clock
unlike DRAM asynchronous. Address is presented to RAM Since SDRAM moves data in time with system
clock, CPU knows when data will be ready CPU does not have to wait, it can do something
else Burst mode allows SDRAM to set up stream of
data and fire it out in block
![Page 81: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/81.jpg)
Internal logic of the SDRAM
• In burst mode, a series of data bits can be clocked out rapidly after the first bit has been accessed.
Burst modeBurst mode is useful when all the bits to be accessed are in sequence and in the same row of the array as the initial access
•A dual-bank internal A dual-bank internal architecturearchitecture that improves opportunities for on-chip parallelism.
• The mode registerThe mode register and associated control logiccontrol logic provide a mechanism to customize the SDRAM to suit specific system needs.
![Page 82: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/82.jpg)
Newer RAM Technology (3)
Foreground readingCheck out any other RAM you can findSee Web site:
The RAM Guide
![Page 83: William Stallings Computer Organization and Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062516/56812aa7550346895d8e6b81/html5/thumbnails/83.jpg)
Exercises Exercises
P143 4.4, 4.6, 4.7, 4.8 P145 4.20
DeadlineDeadline