Introduction to Computer Organization (Cache Memory)krchowdhary.com/co/11-co.pdf ·...
Transcript of Introduction to Computer Organization (Cache Memory)krchowdhary.com/co/11-co.pdf ·...
Introduction to Computer Organization(Cache Memory)
KR ChowdharyProfessor & Head
Email: [email protected]
webpage: krchowdhary.com
Department of Computer Science and EngineeringMBM Engineering College, Jodhpur
November 14, 2013
KR Chowdhary Cache Memory 1/ 30
Introduction to cache
CPUCache Main
MemoryFast Slow
(wordtransfer)
(blocktransfer)
M1 M2
Figure 1: Memory Hierarchy.
*Speed of CPU v/s Main memoryis typically 10:1*A high speed Cache memory (M1)of relatively small size is providedbetween main memory (M2) andCPU forming (M1,M2) hierarchy.Some mechanism is provided incache so that tA ≈ tA1(⇒ cache hitratio ≈ 1) .* Data and instructions arepre-fetched in cache. Due tolocality of reference, otherinstructions/data are likely to be
found cache.* Cache is usually not visible toprogrammers.* Approach: One can search adesired address’s word in a cachefor match; and does this for alllocations in cache simultaneously.*Cache size should be small:average cost is that of RAM.Cache should be large enough:average access time is close to thatof cache.*VAX 11/780: 8-KB generalpurpose cache, Motorola 68020:256 byte on-chip instruction cache,8086: 4 / 6 byte instruction cache,Dual core: L1 : 512KB-2MB, L2 :1-2MB. Instru. cache v/s GP cache?
KR Chowdhary Cache Memory 2/ 30
Cache Principle
◮ Principle: When processorattempts to read a word, itspresence is checked in cachefirst. If found, it is delivered toprocessor, else a block of fixedno. of words is read into cachewhich contains this word also.Simultaneously, the word alsois delivered to cpu.
◮ If the word was found incache, it is Cache hit else it isCache miss.
b
Processor(CPU)
CacheM1
MainMemory
M2
Data Flow with Cache
Figure 2: Cache position.
◮ Cache Design
Let no. of words in RAM =2n, each addressable by n-bits,cache size m, RAM of Mblocks. Word size may be 32,64, . . . bits.
◮ Memory consists fixed lengthblocks, each of size Kwords(= 2w ), total blocks areM = 2n÷K . Cache hasm = 2r blocks or lines. Eachline has 2w words, each havinga tag field and ctrl-field(toshow if line has beenmodified).
◮ tag identifies which particularblocks are presently in cache.
KR Chowdhary Cache Memory 3/ 30
Cache Principle
Mapping Function: Since lines incache, m≪M (blocks in RAM), analgorithm is needed to map M tom. That determine what memoryblock occupies what cache line.
bbb
Memoryaddress
0
12
2n − 1
Block(K - words)
Block
wordslength
RowTagLine no.
01
m− 1
row length(K- words)
Cache Memory
Main Memory
K = 2wbbb
bbb
Figure 3: Address mapping.
Cache Read:
addr bus← PC
isblock incache?
yes
cpu← word
access mainmemory blockhaving the word
allocate cache linefor block
load memoryblock into cache
deliver wordto cpu
end
start
No
Figure 4: Accessing a word.
Three mapping techniques: direct,associative, set associative.
KR Chowdhary Cache Memory 4/ 30
Direct-Mapped Cache
◮ Each block maps to one andonly one line in cache always.The mapping is expressed as:
i = j modm
where, j is main memory blockno., i is cache line no., m isno. of lines in cache.
◮ Words address space = n-bits.RAM is divided into fixedlength blocks of K = 2w wordseach. A word may be equal abyte. ∴ number of blocks inRAM = 2n/K =M = 2s (s isnumber of bits required torepresent block no.) Letm = 2r is number of rows incache, each of size one block
= 2w words. r is no. of bitsrequired for row count. Theaddress space in RAM is:
w-bits
word addr.in a block
n-bit address for all words
s = (n− w)-bits
Block address
r bits rows-r bits region adr.adr.
Figure 5: Address Space in RAM.
◮ There is need of some kind ofmapping between 2r rows incache v/s 2s no. of blocks inRAM, as only 2r out of 2s
blocks can be resident in thecache (a block fits in a row).
KR Chowdhary Cache Memory 5/ 30
Direct-Mapped Cache
◮ Direct mapping: If 2S
mod 2r = 0, then RAM isdivided into 2s ÷ 2r = Rregions as follows:
1. region 0’s blocks nos. :0, . . . ,2r −1, mapped to0, . . . ,2r −1 lines of cache,
2. region 1’s blocks nos.:2r , . . . ,2.2r −1, mapped to0, . . . ,2r −1 lines of cache,
3. . . .
4. region R−1’s block nos. :R−1×2r , . . . ,R .2r −1,mapped to 0, . . . ,2r −1 linesof the cache.
◮ if 2s is total no. of blocks in
RAM, and 2r is no. of rows incache, with row size == blocksize=2w words, then i th blockof RAM is mapped to j th rowof cache as follows, so thatj = i mod 2r .
◮ E.g., row 5 in cache will holdone block (5th block) fromonly one of the regions out of0, . . . ,R− 1.
◮ Thus, a row addresscorresponding to r bits fromaddress field is a direct indexto the block position in aregion.
KR Chowdhary Cache Memory 6/ 30
Direct-Mapped Cache
0 0
bbb
region no. 0
region no. 1
region no. R − 1
each block =
block no.
Main memory
Cache memory
2r rows
Tag field=s− r bits
j = i mod 2r, where i
2w bytes
2r
22r
2r − 1
(R − 1)2r
2w bytes
(2rblocks)
block no. in main memory,
mapped to line j in cache.
(2rblocks)
(2rblocks)
bbb
Figure 6: Address mapping.
◮ For a match, r bits (i.e., index, having value, say, 1011) is takenfrom address field, and the tag field in row no. 21011 of cache iscompared with the s− r bits of address field. If matched (equal),then it is cache hit, else cache-miss.
KR Chowdhary Cache Memory 7/ 30
Direct Mapped Cache
total address space=2s+w
(s− r) Bits r bits w bits
The r bits acts only as index to a row in cache.
If the tag field of cache macthes with s− r bits of addr space for an index, it is hit.
It requires only one comparision per address in the direct mapped cache.
Figure 7: Address mapping.
◮ Only one comparison required to test for hit, hence it requires simplecircuitry to implement.
◮ Thus, if there is frequent reference to blocks from different regions,it will cause thrashing, and cache functioning will be inefficient.
KR Chowdhary Cache Memory 8/ 30
Example: Direct Mapped Cache
◮ Let n = 32, = 4GB, w = 4. One block size, K = 24 = 16 bytes. No.of total blocks=232−4 = 228=256 million. Size of cache = 4MB,m = 4MB, No. of rows in cache=4MB/16 bytes=222/24 = 218.Total no. of regions R = (total blocks)/(blocks in cache)=228/218 = 210.
◮ ∴ n = 32,w = 4, r = 18,s = n−w = 32− 4 = 28.Tag.=s− r = 28− 18= 10.
◮ For address:
1 0 0 0.1 1 1 1.1 0 0 1.0 1 1 0.1 1 0 0.1 0 0 1.1 0 1 0.0 0 0 1
1 0 9 8.7 6 5 4.3 2 1 0.9 8 7 6.5 4 3 2.1 0 9 8.7 6 5 4.3 2 1 0
block addr=(A0−A3)=1, line addr(A4−A21) =2r −1 = 0101101100100110102 , region adr. =A31 . . .A22 = tag bits.
◮ This line addr is common for all the 210 regions. Thus, if tag bits in thecache for line no. 0101101100100110102 matches with the region no.address A31 . . .A22, then this block (A4−A21) = 0101101100100110102 ,is in cache else not.
◮ Line in cache is, i = j mod m = (A31−A4)mod218 =1 0 0 0.1 1 1 1.1 0 0 1.0 1 1 0.1 1 0 0.1 0 0 1.1 0 1 0 mod 218 =0 1.0 1 1 0.1 1 0 0.1 0 0 1.1 0 1 0
KR Chowdhary Cache Memory 9/ 30
Block Replacement Algorithm
◮ Objective: Every request forinstruction/data fetch shouldbe met by the cache. Blockstransfer to cache will improvethe hit. Hit ratio = (totalhits)/(total attempts), whichshould be close to unity(normally > 0.99).
◮ The objective is partlyachieved by algorithm used forblock replacement.
◮ What happens when requestfor loading new block in cachedoes not find space in cache?
◮ One of the row (block frameor line) is to be moved back tomain memory to create space
for new block.
◮ Which block should beremoved?:
1. The one which hasremained in the cache for alongest time, called FIFO(first in first out) algorithm.
2. The one which was usedlong back: LRU (leastrecently used) algorithm.
3. The one which is notfrequently used: LFU (leastfrequently used) algorithm.
4. Random time: a randomno. is generated, and ablock is selectedcorrespondingly for removal(random block selected)).
KR Chowdhary Cache Memory 10/ 30
Block Replacement Algorithm
◮ A block should be writtenback to main memory only ifit is changed, otherwise not.
◮ For each row in cache, whichcorresponds to a block, aspecial bit (sticky bit or dirtybit, as it is called) is allocated.If the block is modified, thebit is set else remain reset.This bit helps to decidewhether to write this line backto RAM or not.
How much you have learned in
direct mapped cache?
1. If two same block numbersfrom two regions are requiredin the cache, what isconsequence? That is, canthey be present same time incache?
2. Does the cpu fetch directlyfrom RAM, if cache is full?
3. A block can be resident athow many alternate locationsin cache?
KR Chowdhary Cache Memory 11/ 30
Associative Cache Memories
◮ Many storage and retrievalproblems require accessingcertain subfields within a setof records, what is John’s IDno. and age?
Name ID number Age
--------------------
J. Smith 124 24
J. Bond 007 40
A. John 106 50
R. Roe 002 19
J. Doe 009 28
-------------------
◮ The row 3 has no logical
relationship to Jones, hence tosearch entire table using namesub-field as an address. But,that is slow.
◮ The associative memorysimultaneously examines allthe entries. They are alsoknown as content addressablememories. The subfield chosenis called tag or key.
◮ Items stored in Associativememories can be viewed as:
KEY, DATA
KR Chowdhary Cache Memory 12/ 30
Other cache types
Associative mapping Cache
Memories:
◮ A memory block (0, . . . ,2s − 1)of main memory can go(mapped to) anywhere (anyline) in cache. This removesthe drawback of Directmapped cache. Cache of size2r lines (r bits) may hold any2r blocks out of total 2s
blocks of RAM. Larger is sizeof cache, the larger no. ofRAM blocks it can hold.
◮ Since any of 2r blocks out of2s blocks may be there incache, we need a tag field of sbits along with each line of
cache.
◮ Steps to find out if a block no.b = 2s is in cache:
1. Total address space =s+w bits, there are 2s
blocks in RAM and eachblocks has 2w words.
2. s bits from address spaceare compared (in parallel)with all the tags of 2r linesof cache. The line in whichmatch occurs, holds theblock having required wordcorresponding to theaddress field. If no matchoccurs with any tag, itshows that it is cache-miss,and the required block isfetched from RAM.
KR Chowdhary Cache Memory 13/ 30
Set Associative Cache
Set Associative cache memories:◮ If there are ten Cache lines to
which a RAM location aremapped, ten cache entries mustbe searched. It takes power, chiparea, and time. Caches withmore associativity has fewermisses. The rule of thumb:Doubling the associativity, fromdirect mapped to 2-way, or from2-way to 4-way, has same effecton hit ratio as doubling cache.
◮ A set-associative scheme is ahybrid between fully associativeand direct mapped, and areasonable compromise betweencomplex hardware (of fullyassociative) v/s simple ofdirect-mapped.
◮ solution is set associative cache
◮ For a cache of size 8 lines, andRAM of 32 blocks. If the cacheis fully associative, the block (say12 no.) can go in any of the 8rows of cache.
◮ In a set associative cache, theblock 12 will be mapped to a setin the cache. Set can be of size1, 2, 4 rows here. For two sets(each of 4-rows), block 12 willbe mapped to a set, i.e., {1,2}.This is called 4-way setassociative cache.
◮ If set is represented by u-bits inaddress field, then set no. can befound by index of u bits. The tagfield of each row = s−u bits.
KR Chowdhary Cache Memory 14/ 30
Set Associative Cache
◮ Compromises: Requirescomplex comparison hardware(to find the correct slot) outof a small set of slots, insteadof all the slots. Such hardwareis linear in the number ofslots(row in a set).
◮ Flexibility of allowing up to Ncache lines per slot for anN-way set associative.
◮ Algorithm to find cache hit:
1. Pick up the u bits out oftotal (s−u)+u of blockaddress; use the u bits asindex to reach to 2u th setin the cache.
2. compare(in parallel) the
s−u bits from address fieldwith tag fields of all the2s−u lines in that set.
3. If any match occurs, it ishit, and line whose tag ismatched, has the requiredblock. So the byte fromthat word is transferred toCPU. Else, it is miss, andthe block is replaced fromRAM.
Block address
Tag Index
blockoffset
(s− u) (u) (w)bits bits bitscompared for hit
selects the setas an index to the set no.
not required for comparisonas entire block iseither preset or absent in cache
Total word address = s+ w- bits = 2s+w
KR Chowdhary Cache Memory 15/ 30
Analysis
◮ In fully associative, u length iszero. For a given size ofcache, we observe following:
◮ tag-index boundary moves toright
⇒ decrease index size u
⇒ increase no. of blocks perset
⇒ increase size of tag field
⇒ Increase associativity
◮ Direct mapped cache is setassociative with set size =1.
◮ tag-index boundary moves toleft
⇒ increases index size u
⇒ decreases no. of blocks perset
⇒ decreases in size of tag field
⇒ Decrease in associativity
◮ What are the effect of:
1. Index boundary max left?
2. Index boundary max right?
3. How far it can go left/right?
4. What is best position of indexboundary?
KR Chowdhary Cache Memory 16/ 30
Write policy
◮ When new block is to befetched into cache, and thereis no space in cache to loadthat block, there is need tocreate space in cache. For thisone of the old block is to belifted off as per the algorithmsdiscussed earlier.
◮ if a line or block frame is notmodified, then it may be overwritten, as it does not requireto update into RAM.
◮ if the block is modified inRAM (possible when multipleprocessors with multiplecaches exist or it is modifiedby device) then cache isinvalid.
◮ One of the technique to solvethis problem is called writethrough, which ensures thatmain memory is always valid.Consequently if any changehas taken place, it should beupdated into RAMimmediately.
- disadv: Wastage of bandwidthdue to frequent updating ofRAM.
- adv: Easier to implement,cache is always clean, readmisses (in cache) never leadwrite to RAM.
- RAM has most current copy ofdata, hence simplifies the datacoherency
KR Chowdhary Cache Memory 17/ 30
Write policy
◮ When write through takesplace, the cpu stalls. To solvethis, the data is written tobuffer, before it goes to RAM.
◮ write back: The cache iswritten (updated) into RAMonly when block is to beremoved to create space incache and the dirty bit of thecache line is found set.
- Consumes lesser BW forwrites.
- Suitable for servers (withmultiple processors).
- Good for embeddedapplications as it saves power.
◮ Shared memory Architecture
◮ Two main problems withshared memory system:performance degradation dueto contention, and coherenceproblems.
- Multiple processors are tryingto access the shared memorysimultaneously.
◮ Having multiple copies ofdata, spread throughout thecaches, might lead to acoherence problem.
◮ The copies in the caches arecoherent if they are all equalto the same value.
KR Chowdhary Cache Memory 18/ 30
Cache coherence
◮ Coherence(def.): A sticking orcleaving together; union ofparts of the same body;cohesion.
- Connection or dependence,proceeding from thesubordination of the parts of athing to one principle orpurpose, as in the parts of adiscourse, or of a system ofphilosophy; a logical andorderly and consistent relationof parts; consecutiveness.
Coherence of discourse, and adirect tendency of all the partsof it to the argument in hand,are most eminently to befound in him. –Locke.
Client 1
Client 2
cache 1
cache 2
coherency
Main
Mem
ory
Figure 8: Multiple caches using ashared resource.
◮ Client 1 modifies the cache 1,but does not write to mainmemory
◮ client 2 accesses that memoryblock which was modified inthe cache 1.
◮ result is invalid data access byclient 2.
KR Chowdhary Cache Memory 19/ 30
Achieving coherence
◮ Principle of coherence: client1 reads location X, modifies it,and writes location X. Inbetween no other processoraccesses X. This results tomemory consistency, else not.
Cache coherence mechanisms:
◮ Directory-based coherence:Data being shared is placed ina common directory, whichacts as a filter through whichprocessor must ask permissionto load an entry from theRAM to cache. When entry ischanged, the directory updatesor invalidates other cacheswith that entry.
◮ Snoopy-bit: Individual cachemonitors address lines of RAMthat have been cached. Whenwrite operation is observed ina location (X) whose copy iswith cache, the cachecontroller invalidates its owncopy of that location (X).
◮ Snarfing: Cache controllerwatches both addr and datalines, and if RAM location (X)gets modified, the cacheupdates its own copy of (X).
◮ The coherence algorithmsabove, and others maintainsconsistency between all cachesin a system of distributedshared memory.
KR Chowdhary Cache Memory 20/ 30
Snooping Protocols
◮ They are based on watchingbus activities and carry outthe appropriate coherencycommands when necessary.
◮ Each block has a stateassociated with it, whichdetermines what happens tothe entire contents of theblock.
◮ The state of a block mightchange due to: Read-Miss,Read-Hit, Write-Miss, andWrite-Hit.
◮ Cache miss: Requested blockis not in the cache or it is inthe cache but has beeninvalidated.
◮ Snooping protocols differ inwhether they update or
invalidate shared copies inremote caches in case of awrite operation.
◮ They also differ as from whereto obtain the new data in thecase of a cache miss.
◮ State: Valid, Invalid.
◮ Event: Read-Hit, Read-Miss,Write-Hit, Write-Miss, Blockreplacement.
◮ Write-Invalidate and
Write-Through: Memory isalways consistent with themost recently updated cachecopy.
◮ Write-Invalidate and
Write-BackKR Chowdhary Cache Memory 21/ 30
Instruction Cache, data cache, Cache levels
◮ Harvard Cache: Separate dataand instruction caches. Allowsaccesses to be less random.
Both have locality of referenceproperty.
◮ Multilevel cache hierarchy:Internal cache, external cache,L1,L2,L3, etc.
◮ Cache Performance: 1. HitRatio, 2. Average memoryaccess time.
- Average memory access time= Hit time + Miss rate * Miss
penalty(i.e. no. of clock cyclesmissed)
- CPU time = (CPU executionclock cycles + memory stallclock cycles) * clock cycletime
- Miss penalty = (memory stallcycles / instruction) =(Misses/Instruction)*(Totalmiss latency - OverlappedMiss latency)
- Overlapped miss latency is dueto out-of-order CPUsstretching the hit time.
KR Chowdhary Cache Memory 22/ 30
Cache Positioning
◮ A cache is always positioned between RAM and CPU. A programmeris unaware of its position, because the most part of cache operationand management is done by hardware and a very small by operatingsystem.
system busCPU
address
data
address buffer
data buffer
controlctrl
Cache
Figure 9: Cache positioning
KR Chowdhary Cache Memory 23/ 30
Virtual to Physical Address Translation
◮ In virtual memory system, the logical address needs to be translatedinto physical address, before instruction/data is fetched.
◮ For cache memory this may be done by cache itself (in that casecache shall be fast) or by MMU (the cache will have to wait till theaddress translation is done).
CP
U
cache
MMU Main
mem
ory
logical address
data
physical
address
Note: Cache need to do address translation
logical addressMMU
cache
Main
Mem
ory
phsicaladdress
data
CP
U
Note: cache gets the translated address
Hence, cache design is simple
MMUmemory managementunit
does the jobaddress translationfrom virtual addressto physical addess
◮ In first, cache itself is doing address translation, in second addresstranslation is done by MMU and cache see only the physical address.
KR Chowdhary Cache Memory 24/ 30
Exercises
1. A block-set-associative cache consists of a total 64-blocks dividedinto 4-block sets. The main memory consists of 4096 blocks, eachconsisting of 128 words.
1.1 How many bits are there in a main memory address?1.2 How many bits are there in each of the TAG, SET, and WORD
fields?
2. A 2-way set associative cache memory uses blocks of four words.The cache can accommodate a total of 2048 words from mainmemory. The main memory size if 128K × 32.
2.1 Formulate all the required information to construct the cachememory.
2.2 What is size of the cache memory?
3. Let us consider a memory hierarchy (main memory + cache) givenby:
◮ Memory size 1 Giga words of 16 bit (word addressed)◮ Cache size 1 Mega words of 16 bit (word addressed)◮ Cache block size 256 words of 16 bit
KR Chowdhary Cache Memory 25/ 30
Exercises
Let us consider the following cache structures:◮ direct mapped cache;◮ fully associative cache;◮ 2-way set-associative cache;◮ 4-way set-associative cache;◮ 8-way set-associative cache.
◮ Calculate the structure of the addresses for the previous cachestructures;
◮ Calculate the number of blocks for the previous cache structures;◮ Calculate the number of sets for the previous set associative caches.
4. A cache memory is usually divided into lines. Assume that acomputer has memory of 16 MB, and a cache size of 64 KB. Acache block can contain 16 bytes.4.1 Determine the length of the tag, index and offset bits of the address
for: a. Direct Mapped Cache, b. 2-way set Associative Cache, c.Fully Associative Cache
4.2 Assuming a memory has 32 blocks and a cache consists of 8 blocks.Determine where the 13th memory block will be found in the cachefor: a. Direct Mapped Cache, b. 2-way Set Associative Cache, c.Fully Associative Cache
KR Chowdhary Cache Memory 26/ 30
Exercises
5. Consider a machine with a byte addressable main memory of 216
bytes and block size of 8 bytes. Assume that a direct mapped cacheconsisting of 32 lines is used with this machine.
5.1 How is the 16-bit memory address divided into tag, line number, andbyte number?
5.2 Into what line would bytes with each of the following addresses bestored?0001 0001 1001 10111100 0011 0010 01001010 1010 0110 1010
5.3 Suppose the byte with address 0001 0001 1001 1011 is stored in thecache.What are the addresses of the other bytes stored along withthis?
5.4 How many total bytes of memory can be stored in the cache?5.5 Why is the tag also stored in the cache?
KR Chowdhary Cache Memory 27/ 30
Exercises
6. In a cache-based memory system using FIFO for cache pagereplacement, it is found that the cache hit ratio H is unacceptedlylow. The following proposals are made for increasing H :
6.1 Increase the cache page size.6.2 Increase the cache storage capacity.6.3 Increase main memory capacity.6.4 Replace the FIFO policy by LRU.
Analyse each proposal to determine its probable impact on H .
7. Consider a system containing a 128-byte cache. Suppose thatset-associative mapping is used in the cache, and that there are foursets each containing four lines. The physical address size is 32-bits,and the smallest addressable unit is the byte.
7.1 Draw a diagram showing the organization of the cache andindicating how physical addresses are related to cache addresses.
7.2 To what lines of the cache can the address 000010AF16 be assigned?7.3 If the addresses 000010A16 and FFFF7Axy16 are simultaneously
assigned to the same cache set, what values can the address digits xand y have?
KR Chowdhary Cache Memory 28/ 30
Exercises
8. Discuss briefly the advantages and disadvantages of the followingcache designs which have been proposed and in some casesimplemented. Identify three nontrivial advantages or disadvantages(one or two of each) for each part of the problem:8.1 An instruction cache, which only stores program code but not data.8.2 A two-level cache, where the cache system forms a two level-memory
hierarchy by itself. Assume that the entire cache subsystem will bebuilt into the CPU.
9. A set associative cache comprises 64 lines, divided into four-linesets. The main memory contains 8K block of 64 words each. Showthe format of main memory addresses.
10. A two-way set associative cache has lines of 16 bytes and a total sizeof 8 kbytes. The 64-Mbyte main memory is byte addressable. Showthe format of main memory addresses.
11. Consider a 32-bit microprocessor that has an on-chip 16-Kbytefour-way set associative cache. Assume that the cache has a linesize of four 32-bit words. Draw a block diagram of this cacheshowing its organization and how the different address field are usedtoo determine a cache hit/miss. Where in the cache is the wordfrom memory location ABCDE8F816 mapped?
KR Chowdhary Cache Memory 29/ 30
Bibliography
John P. Hayes, “Computer Architecture and Organization”, 2ndEdition, McGraw-Hill, 1988.
William Stalling, “Computer Organization and Architecture”, 8thEdition, Pearson, 2010.
M. Morris Mano, “Computer System Architecture”, 3rd Edition,Pearson Education, 2006.
Carl Hamacher, Zvono Vranesic, and Safwat Zaky, “ComputerOrganization ”, , 5th edition, McGrawhill Education, 2011. (chapter7)
KR Chowdhary Cache Memory 30/ 30