IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based...
-
Upload
robyn-cole -
Category
Documents
-
view
230 -
download
0
description
Transcript of IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based...
![Page 1: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/1.jpg)
Hybrid Cache Architecture with Disparate Memory Technologies
Xiaoxia Wu† Jian Li‡ Lixin Zhang‡ Evan Speight‡ Ram Rajamony‡ Yuan Xie†† Pennsylvania State University
‡ IBM Austin Research Laboratory
![Page 2: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/2.jpg)
Agenda• Introduction• Methodology• Level based Hybrid Cache Architecture• Region based Hybrid Cache Architecture• 3D Hybrid Cache Stacking• Conclusion
2/18
![Page 3: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/3.jpg)
Introduction (1/3)• Traditional SRAM-based cache architecture
– Limited size with CMP: cache-core balance– Leakage power– More cache levels: Design overhead, coherence– Non-uniform Cache Architecture (wire delay)
• Improve cache power-performance with Emerging Memory Technologies, under the same chip area/footprint– Embedded DRAM– Magnetic RAM– Phase Change RAM– Three-dimensional space
3/18
![Page 4: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/4.jpg)
Introduction (2/3)• Different Memory Technologies
4/23
SRAM(6T)
DRAM(1T 1C)
MRAM(1T 1J)
PRAM(1T 1J)
Density (ratio) Low (1) High (4) High (4) High (16)
Dynamic Power Low Medium Low for readHigh for write
Medium for read
High for write
Leakage Power High Medium Low Low
Speed Very fast Fast Fast for readSlow for write
Slow for readSlowest for
write
Non-volatility No No Yes Yes
Scalability Yes Yes Yes Yes
Endurance1610 1610 1510 810
![Page 5: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/5.jpg)
Introduction (3/3)• Motivation
5/18
L2 Cache
![Page 6: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/6.jpg)
Methodology (1/2)
Core / L1
L2(SRAM)
L3(SRAM)
L3(SRAM)
L2(SRAM)
Core / L1
Core / L1
L2(SRAM)
L3(SRAM)
L3(SRAM)
L2(SRAM)
Core / L1
Core / L1
L2(SRAM)
L3(SRAM)
L3(SRAM)
L2(SRAM)
Core / L1
Core / L1
L2(SRAM)
L3(SRAM)
L3(SRAM)
L2(SRAM)
Core / L1
Core / L1L2
(SRAM)
L3(SRAM)
Core / L1L2
(SRAM)L3
(eDRAMMRAMPRAM)
Core / L1L2
(SRAM)L2
(eDRAMMRAMPRAM)
L3 PRAM
L2 (SRAM)L2 (eDRAMMRAM)
Core / L1
L2 PRAM
L2 (SRAM)L2 (eDRAMMRAM)
Core / L1
L4 PRAM
L2 (SRAM)L2 (eDRAM
MRAM)
Core / L1
LHCA RHCA
3DHCA
(A) (B)
(C) (D) (E)
6/18
![Page 7: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/7.jpg)
Methodology (2/2)
Cache Density Latency(cycle)
DynamicEnergy (nJ)
StaticPower
(W)
SRAM(1M) 1 8 0.388 1.36
eDRAM(4M) 4 24 0.72 0.4
MRAM(4M) 4 Read:20Write:60
Read: 0.4Write: 2.3 0.15
PRAM(16M) 16 Read:40Write:200
Read: 0.8Write: 1.5 0.3
Item Setting value
Processor 8-way issue out-of-order, 8-core, 4GhzL1 32KB DL1, 32KB IL1, 128B, 4-way, 1 R/W port
L2/L3/L4 eDRAM, MRAM, PRAM & 3D StackingMemory 400 cycles latency
• Benchmark: SpecInt06, Specjbb, NAS, Bioperf, Parsec
• Simulator: SystemSim full system simulatorBase line: 256KB (L2) + 1MB(L3)7/18
![Page 8: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/8.jpg)
LHCA (Level based Hybrid Cache Architec-ture)
8/18
![Page 9: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/9.jpg)
RHCA (Region based Hybrid Cache Architecture, 1/7)• Mutually exclusive regions• Parallel search unified LRU• Fast and slow regions in on cache level• Intra-cache data movement policy
– Move frequently used data to the fast region• Drowsy* RHCA
– Keep slow region in drowsy mode– The drowsy mode can be power-gating the non-volatile
memory cells and/or corresponding peripheral CMOS logic.
– It will be used the primitive drowsy mode for the DRAM.
Drowsy Mode: 캐시의 일정 성능을 유지하는 범위에서 소자에 전달되는 전력을 조절하여 전력 소모를 최적화 하는 방법 [15]9/18
![Page 10: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/10.jpg)
RHCA (Region based Hybrid Cache Architecture, 2/7)• Intra-cache data movement policy
– On a cache hit, if the corresponding cache line resides in the fast region, its sticky bit is always set.
10/18
![Page 11: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/11.jpg)
RHCA (Region based Hybrid Cache Architecture, 3/7)• Structure for swap operation.
11/18
![Page 12: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/12.jpg)
RHCA (Region based Hybrid Cache Architecture, 4/7)
RHCA (fast+slow) Fast region L2 total size (latency)
SRAM+eDRAM 256KB (6 cycles) 4MB (24 cycles)
SRAM+MRAM 256KB (6 cycles) 4MB (r: 20, w: 60)
SRAM+PRAM 256KB (6 cycles) 16MB (r: 40, w: 200)
• Slow region: 256KB/bank, 1 r/w port, block size 128B, associativity:16, 16, 64
• RHCA is 256KB less size than corresponding LHCA– Avoid odd-sized cache
• DNUCA policy: more fine grained, move a line to a closer bank to CPU on each hit, bank-based, same size(Dynamic Non-Uniform Cache Architectures)
12/18
![Page 13: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/13.jpg)
RHCA (Region based Hybrid Cache Architecture, 5/7)
eDRAM
MRAM
PRAM
Hit ratio
13/18
SRAM-eDRAM
![Page 14: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/14.jpg)
RHCA (Region based Hybrid Cache Architecture, 6/7)• Multi-core
• Wake-up latency
14/17
SRAM-eDRAM
![Page 15: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/15.jpg)
RHCA (Region based Hybrid Cache Architecture, 7/7)• Threshold
• Replacement and insertion policy
15/17
Baseline: LRU
![Page 16: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/16.jpg)
3D Hybrid Cache Stacking
L3 PRAM
L2 (SRAM)L2 (eDRAMMRAM)
Core / L1
L2 PRAM
L2 (SRAM)L2 (eDRAMMRAM)
Core / L1
L4 PRAM
L2 (SRAM)L2 (eDRAM
MRAM)
Core / L1
(C) (D) (E)
• 3DHCA-C (3D LHCA): 256KB L2 SRAM, 4M L3 eDRAM, 32M L4 PRAM
• 3DHCA-D: 32M L2 fast, middle, slow region (3D RHCA)– Data in slow region can be moved to fast and middle re-
gions• 3DHCA-E: 4M L2 fast+slow region, 32M L3 PRAM
(LHCA+RHCA) 16/18
![Page 17: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/17.jpg)
3D Hybrid Cache Stacking
17/18
![Page 18: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion.](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b1b7f8b9ab059993414/html5/thumbnails/18.jpg)
Conclusion• Hybrid cache architecture is promising to improve
cache power-performance under same chip area/footprint
• RHCA and LHCA achieve better power-perfor-mance than SRAM-based design
• RHCA outperforms LHCA with minimal hardware support
• 3DHCA achieves better performance than LHCA and RHCA, while still maintains lower power than 2D SRAM baseline
18/18