ECE200 – Computer Organization
Chapter 9 – Multiprocessors
What we’ll cover today
Multiprocessor motivation
Multiprocessor organizations
Shared memory multiprocessorsCache coherenceSynchronization
Multiprocessor motivation, part 1
Many scientific applications take too long to run on a single processor machineModeling of weather patterns, astrophysics, chemical
reactions, ocean currents, etc.
Many of these are parallel applications which largely consist of loops which operate on independent data
Such applications can make efficient use of a multiprocessor machine with each loop iteration running on a different processor and operating on independent data
Multiprocessor motivation, part 2
Many multi-user environments require more compute power than available from a single processor machineAirline reservation system, department store chain
inventory system, file server for a large department, web server for a major corporation, etc.
These consist of largely parallel transactions which operate on independent data
Such applications can make efficient usage of a multiprocessor machine with each transaction running on a different processor and operating on independent data
Multiprocessor organizations
Shared memory multiprocessorsAll processors share the same memory address spaceSingle copy of the OS (although some parts may be
parallel)Relatively easy to program and port sequential code toDifficult to scale to large numbers of processorsUniform memory access (UMA) machine block diagram
Multiprocessor organizations
Distributed memory multiprocessorsProcessors have their own memory address spaceMessage passing used to access another processor’s
memoryMultiple copies of the OSUsually commodity hardware and network (e.g.,
Ethernet)More difficult to programEasier to scale hardware and more inherently fault
resilient
Multiprocessor variants Non-uniform memory access (NUMA)
shared memory multiprocessorsAll memory can be addressed by all processors, but
access to a processor’s own local memory is faster than access to another processor’s remote memory
Looks like a distributed machine, but interconnection network is usually custom-designed switches and/or buses
Multiprocessor variants Distributed shared memory (DSM)
multiprocessorsCommodity hardware of a distributed memory
multiprocessor, but all processors have the illusion of shared memory
Operating system handles accesses to remote memory “transparently” on behalf of the application
Relieves application developer of the burden of memory management across the network
Multiprocessor variants
Shared memory machines connected together over a network (operating as a distributed memory or DSM machine)
network controller
network controller
…
network
Shared memory multiprocessors
Major design issuesCache coherence: ensuring that stores to cached data
are seen by other processorsSynchronization: the coordination among processors
accessing shared data Memory consistency: definition of when a processor
must observe a write from another processor
Cache coherence problem
Two writeback caches becoming incoherent
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
Cache coherence problem
Two writeback caches becoming incoherent
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
CPU 0
cache
CPU 1
cache
main memory
A
A
(2) CPU 1 reads block A
A
Cache coherence problem
Two writeback caches becoming incoherent
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
CPU 0
cache
CPU 1
cache
main memory
A
A
(2) CPU 1 reads block A
A
CPU 0
cache
CPU 1
cache
main memory
A
A
(3) CPU 0 writes block A
Aold, out of date
copies of block A
Cache coherence protocols
Ensures that cached blocks that are written to are observable by all processors
Assigns a state field to all cached blocks
Defines actions for performing reads and writes to blocks in each state that ensure cache coherence
Actions are much more complicated than described here in a real machine with a split transaction bus
MESI cache coherence protocol
Commonly used (or variant thereof) in shared memory multiprocessors
Idea is to ensure that when a cache wants to write to a cache block that other remote caches invalidate their copies first
Each cache block is in one of four states (2 bits stored with each cache block)Invalid: contents are not validShared: other processor caches may have the same
copy; main memory has the same copyExclusive: no other processor cache has a copy; main
memory has the same copyModified: no other processor cache has a copy; main
memory has an old copy
MESI cache coherence protocol
Actions on a load that results in cache hitLocal cache actions
Read blockRemote cache actions
None
Actions on a load that results in cache missLocal cache actions
Request block from bus If not in a remote cache, set state to Exclusive If also in a remote cache, set state to Shared
Remote cache actionsLook up cache tags to see if the block is present If so, signal the local cache that we have a copy, provide it if it
is in state Modified, and change the state of our copy to Shared
MESI cache coherence protocol
Actions on a store that results in cache hitLocal cache actions
Check state of block If Shared, send an Invalidation bus command to all remote
cachesWrite the block and change the state to Modified
Remote cache actionsUpon receipt of an Invalidation command on the bus, look up
cache tags to see if the block is present If so, change the state of the block to Invalid
Actions on a store that results in cache missLocal cache actions
Simultaneously request block from bus and send an Invalidation command
After block received, write the block and set the state to ModifiedRemote cache actions
Look up cache tags to see if the block is present If so, signal the local cache that we have a copy, provide it if it is
in state Modified, and change the state of our copy to Invalid
Cache coherence problem revisited
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
Exclusive
Cache coherence problem revisited
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
CPU 0
cache
CPU 1
cache
main memory
A
A
(2) CPU 1 reads block A
AShared SharedExclusive
Cache coherence problem revisited
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
CPU 0
cache
CPU 1
cache
main memory
A
A
(2) CPU 1 reads block A
A
CPU 0
cache
CPU 1
cache
main memory
A
A
(3) CPU 0 cache invalidates remote block A
A
Shared Shared
Invalidate command
InvalidShared
Exclusive
Cache coherence problem revisited
CPU 0
cache
CPU 1
cache
main memory
A
A
(1) CPU 0 reads block A
CPU 0
cache
CPU 1
cache
main memory
A
A
(2) CPU 1 reads block A
A
CPU 0
cache
CPU 1
cache
main memory
A
A
(3) CPU 0 cache invalidates remote block A
A
Shared Shared
Invalidate command
InvalidShared
CPU 0
cache
CPU 1
cache
main memory
A
A
(4) CPU 0 writes block A
A InvalidModified
Exclusive
Synchronization
For parallel programs to share data, we must make sure that accesses to a given memory location are orderedExample: database of available inventory at a department
store simultaneously accessed from different store computers; only one computer must “win the race” to reserve a particular item
SolutionArchitecture defines a special atomic swap instruction in
which a memory location is tested for 0, and if so, is set to 1
Software associates a lock variable with each data that needs to be ordered (e.g., particular class of merchandise) and uses the atomic swap instruction to try to set it
Software acquires the lock before modifying the associated data (e.g., reserving the merchandise)
Software releases the lock by setting it to 0 when done
Synchronization flowchart
“spinning”
Synchronization and coherence example
Questions?
Top Related