Cache Coherence “Can we do a better job of supporting cache coherence ?”
Cache Coherence in Multiprocessors
-
Upload
bravoyusuf -
Category
Documents
-
view
167 -
download
0
description
Transcript of Cache Coherence in Multiprocessors
![Page 1: Cache Coherence in Multiprocessors](https://reader031.fdocuments.us/reader031/viewer/2022013107/54486c1db1af9f53618b4889/html5/thumbnails/1.jpg)
CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors
Page 1 of 4
Multiprocessor Cache Coherence In a shared-memory multiprocessor system with each processor having its own cache, a cache update makes the data
block in question different from its copies in shared memory as well as in other caches in the system and therefore it
generates an inconsistency across the processors. In order to guarantee access of clean data blocks by all processors,
we need a mechanism to maintain coherence among caches and shared memory. Such mechanisms, known as
multiprocessor cache coherence protocols fall in two categories:
1. Snooping Protocols
For bus-based shared memory multiprocessors
2. Directory-Based Protocols
For other ICN-based distributed shared memory multiprocessors
Snooping Protocols Every processor is equipped with a snoopy controller (i.e. a bus watcher) to snoop on i.e. monitor the bus activity.
These protocols are of two types:
1. Write-Invalidate (WI)
CPU wanting to write to an address, grabs the bus control and broadcasts a write invalidate message
All snooping processors invalidate the copy of block in question if that particular block resides in their
caches
Hence, any subsequent access of this block in other processor’s caches (i.e. other than the updating
processor) causes a miss and triggers re-fetch of data from global shared memory
2. Write-Update (WU)
CPU wanting to write to an address, grabs the bus control and broadcasts new data as it updates its own
copy
All snooping caches update the copy of block in question
Comparison: WI vs. WU Multiple writes to same block require only ONE invalidate message but would require multiple
updates.
o Due to both spatial and temporal locality previous cases occur quite often
o Bus bandwidth is a precious commodity in shared memory multiprocessors
o Experience has shown that invalidate protocols use significantly less bandwidth but they are slower
because time is spent invalidating the block, and then serving the cache miss
Here we describe MESI, a write-invalidate, write-back multiprocessor cache coherence protocol. This protocol is
used in Intel Pentium, PowerPC 601 & MIPS R4400 (used in SGI Challenge MP)
![Page 2: Cache Coherence in Multiprocessors](https://reader031.fdocuments.us/reader031/viewer/2022013107/54486c1db1af9f53618b4889/html5/thumbnails/2.jpg)
CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors
Page 2 of 4
MESI Protocol States Each block in cache has one of the following four states associated with it.
1. Exclusive (E)
The block is clean, that is, the same copy as in shared memory and no other processor’s cache has a copy of this
block
2. Shared (S)
The block is clean and at least one other processor’s cache has a clean copy of this block
3. Modified (M)
The block has been modified by a by a recent write hit to the cache. No other processor’s cache has a valid copy
of this block
4. Invalid (I)
The data in the block is not valid e.g. power up contents.
Operation State of a cache line (or block) changes as a function of memory access events. These events may be of two types:
Local processor activity
Bus Activity
The MESI operation can be understood in terms of events that may take place as detailed below.
LOCAL Read Hit
Line must be in one of M, E, or S states
Hence, no state change is needed in response to this event
LOCAL Read Miss
a) No other cache contains this block The requesting processor makes bus request for memory access
Block is retrieved from global shared memory into processor’s cache and marked as E
b) One other cache has an E copy The requesting processor makes bus request for memory access
The snooping processor having E copy in its cache causes the requesting processor to abandon
the memory access & puts the copy of block on the bus
The requesting processor gets the block in its cache
Both the processors mark the block as S
![Page 3: Cache Coherence in Multiprocessors](https://reader031.fdocuments.us/reader031/viewer/2022013107/54486c1db1af9f53618b4889/html5/thumbnails/3.jpg)
CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors
Page 3 of 4
c) Several caches have S copy The requesting processor makes bus request for memory access
One of the snooping processors having S copy of the block in its cache gets the bus access
through arbitration, causes the requesting processor to abandon the memory access & puts the
copy of block on the bus
The requesting processor gets the block in its cache and marks it as S
Copies of the block in the other caches remain S
d) One cache has an M copy The requesting processor makes bus request for memory access
The snooping processor having M copy in its cache causes the requesting processor to abandon
the memory access & puts the copy of block on the bus
The requesting processor gets the block in its cache
Both the processors mark the block as S
The source is also written back to shared memory
LOCAL Write Hit
Line must be in one of M, E, or S states
a) The block is in M state The line is already modified; therefore no state change is warranted
b) The block is in E state The state changes from E to M
c) The block is in S state The updating processor broadcasts an invalidate message on the bus to let other processors know
about the update
Snooping processors having S copy of the block change state from S to I
The updating processor marks the block as M
LOCAL Write Miss
a) No other cache contains this block The block is cached from main memory and updated
State changes from I to M
b) S or E copy in other caches The bus transaction is marked with RWITM (Read With Intent To Modify) signal
Snooping processors having S or E copy of this block observe this bus activity and mark their copies
to I and one of them provides the required block to the requesting processor
The updating processor marks the block as M
![Page 4: Cache Coherence in Multiprocessors](https://reader031.fdocuments.us/reader031/viewer/2022013107/54486c1db1af9f53618b4889/html5/thumbnails/4.jpg)
CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors
Page 4 of 4
c) M copy in another cache The requesting processor issues RWITM signal
The snooping processor having the M copy observes this bus activity and responds as:
o Blocks RWITM request
o Takes control of the bus
o Writes back to shared memory
o Sets its copy of data to I
The requesting processor re-issues RWITM signal
This now becomes the same case as a) above
State Transition Diagram
RH SRH
SRH RH
SWH RM
RM WH
SRH SWH
WH SWH RH WM WH
Legends: RH: Read Hit SRH: Snoop Read Hit RM: Read Miss SWH: Snoop Write Hit WH: Write Hit WM: Write Miss
*******
E
S
M
I