Cache Coherence in Multiprocessors

4
CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors Page 1 of 4 Multiprocessor Cache Coherence In a shared-memory multiprocessor system with each processor having its own cache, a cache update makes the data block in question different from its copies in shared memory as well as in other caches in the system and therefore it generates an inconsistency across the processors. In order to guarantee access of clean data blocks by all processors, we need a mechanism to maintain coherence among caches and shared memory. Such mechanisms, known as multiprocessor cache coherence protocols fall in two categories: 1. Snooping Protocols For bus-based shared memory multiprocessors 2. Directory-Based Protocols For other ICN-based distributed shared memory multiprocessors Snooping Protocols Every processor is equipped with a snoopy controller (i.e. a bus watcher) to snoop on i.e. monitor the bus activity. These protocols are of two types: 1. Write-Invalidate (WI) CPU wanting to write to an address, grabs the bus control and broadcasts a write invalidate message All snooping processors invalidate the copy of block in question if that particular block resides in their caches Hence, any subsequent access of this block in other processor’s caches (i.e. other than the updating processor) causes a miss and triggers re-fetch of data from global shared memory 2. Write-Update (WU) CPU wanting to write to an address, grabs the bus control and broadcasts new data as it updates its own copy All snooping caches update the copy of block in question Comparison: WI vs. WU Multiple writes to same block require only ONE invalidate message but would require multiple updates. o Due to both spatial and temporal locality previous cases occur quite often o Bus bandwidth is a precious commodity in shared memory multiprocessors o Experience has shown that invalidate protocols use significantly less bandwidth but they are slower because time is spent invalidating the block, and then serving the cache miss Here we describe MESI, a write-invalidate, write-back multiprocessor cache coherence protocol. This protocol is used in Intel Pentium, PowerPC 601 & MIPS R4400 (used in SGI Challenge MP)

description

In a shared-memory multiprocessor system with each processor having its own cache, a cache update makes the datablock in question different from its copies in shared memory as well as in other caches in the system and therefore itgenerates an inconsistency across the processors.

Transcript of Cache Coherence in Multiprocessors

Page 1: Cache Coherence in Multiprocessors

CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors

Page 1 of 4

Multiprocessor Cache Coherence In a shared-memory multiprocessor system with each processor having its own cache, a cache update makes the data

block in question different from its copies in shared memory as well as in other caches in the system and therefore it

generates an inconsistency across the processors. In order to guarantee access of clean data blocks by all processors,

we need a mechanism to maintain coherence among caches and shared memory. Such mechanisms, known as

multiprocessor cache coherence protocols fall in two categories:

1. Snooping Protocols

For bus-based shared memory multiprocessors

2. Directory-Based Protocols

For other ICN-based distributed shared memory multiprocessors

Snooping Protocols Every processor is equipped with a snoopy controller (i.e. a bus watcher) to snoop on i.e. monitor the bus activity.

These protocols are of two types:

1. Write-Invalidate (WI)

CPU wanting to write to an address, grabs the bus control and broadcasts a write invalidate message

All snooping processors invalidate the copy of block in question if that particular block resides in their

caches

Hence, any subsequent access of this block in other processor’s caches (i.e. other than the updating

processor) causes a miss and triggers re-fetch of data from global shared memory

2. Write-Update (WU)

CPU wanting to write to an address, grabs the bus control and broadcasts new data as it updates its own

copy

All snooping caches update the copy of block in question

Comparison: WI vs. WU Multiple writes to same block require only ONE invalidate message but would require multiple

updates.

o Due to both spatial and temporal locality previous cases occur quite often

o Bus bandwidth is a precious commodity in shared memory multiprocessors

o Experience has shown that invalidate protocols use significantly less bandwidth but they are slower

because time is spent invalidating the block, and then serving the cache miss

Here we describe MESI, a write-invalidate, write-back multiprocessor cache coherence protocol. This protocol is

used in Intel Pentium, PowerPC 601 & MIPS R4400 (used in SGI Challenge MP)

Page 2: Cache Coherence in Multiprocessors

CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors

Page 2 of 4

MESI Protocol States Each block in cache has one of the following four states associated with it.

1. Exclusive (E)

The block is clean, that is, the same copy as in shared memory and no other processor’s cache has a copy of this

block

2. Shared (S)

The block is clean and at least one other processor’s cache has a clean copy of this block

3. Modified (M)

The block has been modified by a by a recent write hit to the cache. No other processor’s cache has a valid copy

of this block

4. Invalid (I)

The data in the block is not valid e.g. power up contents.

Operation State of a cache line (or block) changes as a function of memory access events. These events may be of two types:

Local processor activity

Bus Activity

The MESI operation can be understood in terms of events that may take place as detailed below.

LOCAL Read Hit

Line must be in one of M, E, or S states

Hence, no state change is needed in response to this event

LOCAL Read Miss

a) No other cache contains this block The requesting processor makes bus request for memory access

Block is retrieved from global shared memory into processor’s cache and marked as E

b) One other cache has an E copy The requesting processor makes bus request for memory access

The snooping processor having E copy in its cache causes the requesting processor to abandon

the memory access & puts the copy of block on the bus

The requesting processor gets the block in its cache

Both the processors mark the block as S

Page 3: Cache Coherence in Multiprocessors

CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors

Page 3 of 4

c) Several caches have S copy The requesting processor makes bus request for memory access

One of the snooping processors having S copy of the block in its cache gets the bus access

through arbitration, causes the requesting processor to abandon the memory access & puts the

copy of block on the bus

The requesting processor gets the block in its cache and marks it as S

Copies of the block in the other caches remain S

d) One cache has an M copy The requesting processor makes bus request for memory access

The snooping processor having M copy in its cache causes the requesting processor to abandon

the memory access & puts the copy of block on the bus

The requesting processor gets the block in its cache

Both the processors mark the block as S

The source is also written back to shared memory

LOCAL Write Hit

Line must be in one of M, E, or S states

a) The block is in M state The line is already modified; therefore no state change is warranted

b) The block is in E state The state changes from E to M

c) The block is in S state The updating processor broadcasts an invalidate message on the bus to let other processors know

about the update

Snooping processors having S copy of the block change state from S to I

The updating processor marks the block as M

LOCAL Write Miss

a) No other cache contains this block The block is cached from main memory and updated

State changes from I to M

b) S or E copy in other caches The bus transaction is marked with RWITM (Read With Intent To Modify) signal

Snooping processors having S or E copy of this block observe this bus activity and mark their copies

to I and one of them provides the required block to the requesting processor

The updating processor marks the block as M

Page 4: Cache Coherence in Multiprocessors

CS-421 Parallel Processing BE (CIS) Batch 2004-05 Cache Coherence in Multiprocessors

Page 4 of 4

c) M copy in another cache The requesting processor issues RWITM signal

The snooping processor having the M copy observes this bus activity and responds as:

o Blocks RWITM request

o Takes control of the bus

o Writes back to shared memory

o Sets its copy of data to I

The requesting processor re-issues RWITM signal

This now becomes the same case as a) above

State Transition Diagram

RH SRH

SRH RH

SWH RM

RM WH

SRH SWH

WH SWH RH WM WH

Legends: RH: Read Hit SRH: Snoop Read Hit RM: Read Miss SWH: Snoop Write Hit WH: Write Hit WM: Write Miss

*******

E

S

M

I