Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
-
Upload
lee-brundage -
Category
Documents
-
view
218 -
download
0
Transcript of Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
![Page 1: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/1.jpg)
Cache Coher-ence
“Can we do a better job of supporting cache co-
herence?”
Ross DalyChan Kim
![Page 2: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/2.jpg)
Definition of CC• “For any given memory location, at any given moment
in time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.”
• “Data-Value Invariant: the value of a memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch”
- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence, volume 6 of Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, May 2011.
![Page 3: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/3.jpg)
Goals• Improve performance for cache coherency on multi-core/many-core systems.
• Scaling the number of cores to increase perfor-mance A
• Scaling the number of cores with out increasing cache coherence complexity.
![Page 4: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/4.jpg)
Xpoint Cache• Motivation:
![Page 5: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/5.jpg)
Xpoint: Architecture(2D)
Typical bus based Architecture Xpoint Architecture
![Page 6: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/6.jpg)
Xpoint: Architecture(3D)
![Page 7: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/7.jpg)
Xpoint: Results• 29x speedup for 32 core system
• 45x speedup for 64 core system
• 2.1 improvement over 64 core conventional bus
![Page 8: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/8.jpg)
Increasing the Effectiveness of Directory Caches by Deactivating
Coherence for Private Memory Blocks: Motivation
• Keeping track of all the blocks in directory entails huge storage requirements.
• Directory cache requires less storage, but it will suffer from directory cache misses.
• Most of the accessed blocks (about 75% on avg.) are private.
![Page 9: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/9.jpg)
Increasing the Effectiveness of Directory Caches by Deactivating
Coherence for Private Memory Blocks: Private vs. Shared blocks
• Coarse-grain strategy (page granularity)
• OS detects when a private page must become shared.
• Every new page load is private
• When another processor access private blocks, it becomes shared.
![Page 10: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/10.jpg)
Increasing the Effectiveness of Directory Caches by Deactivating
Coherence for Private Memory Blocks
![Page 11: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/11.jpg)
Increasing the Effectiveness of Directory Caches by Deactivating
Coherence for Private Memory Blocks: Coherence Recovery Mecha-
nism
• Flushing-based Recovery Mechanism- Flushing all the blocks within a page may in-crease
the miss rate.
• Updating-based Recovery Mechanism
![Page 12: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/12.jpg)
Increasing the Effectiveness of Directory Caches by Deactivating
Coherence for Private Memory Blocks: Results
• Directory caches can avoid the tracking of about 57%
• Shorten the runtime of parallel application by 15% while keeping directory cache size or to maintain system performance while using direc-tory caches 8 times smaller.
![Page 13: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/13.jpg)
Complexity-Effective Multicore Coherence
• Similarity- Motivation
- Private and Shared blocks
• Difference- Simplifying the protocol
- directory-less
![Page 14: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/14.jpg)
Complexity-Effective Multicore Coherence:
Simplifying the protocol• Dynamic write policy - Write-back vs. Write-through
• VIPS Cache coherency protocol- Valid/Invalid – Private/Shared
![Page 15: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/15.jpg)
Complexity-Effective Multicore Coherence:
Directory-less• Self-invalidation
- Readers are allowed to make unregistered copies of a memory location, as long as they promise to invalidate these at the next synchronization point.- Doe this follow cache coherency?
• Selective Flushing
• Write-through at a word granularity with per-word dirty bit
![Page 16: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/16.jpg)
Complexity-Effective Multicore Coherence:
Simplifying the protocol: Synchronization• Synchronization relies on data race
• Atomic instructions spin locally in it’s L1 until the condition is changed by another core.
• In this paper, a core does not send invalidation signal to other cores when executes write inst.
• Solution?
![Page 17: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551c170f550346a34f8b56c1/html5/thumbnails/17.jpg)
Complexity-Effective Multicore Coherence:
Simplifying the protocol: Results• Outperformed MESI directory protocol by 4.8%
• Reduced network energy consumption by 14.2%
• Simulated for 15 parallel benchmarks, on 16 cores