ARI. HiPEAK 2014

1. Viacheslav Fedorov, Sheng Qiu, Narasimha Reddy, Paul Gratz Texas A&M University ARI: Adaptive Replacement and Insertion HiPEAC 2013, Vienna, Austria

2. Conventional Main Memory Usually we only care about speeding up the cache miss path Main Memory Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$

3. Main Memory: Trends New Memories emerging DRAM not dense enough Replace or augment DRAM DRAM Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$ DRAM PCM DRAM cache

4. PCM Technology Based on Chalcogenide glass Exploits two phases Amorphous Chrystalline Higher density than DRAM Non-volatile Image: Stanford NanoHeat Lab

5. DRAM vs PCM DRAM is writeback-agnostic Write Buffers cushion the impact of writebacks State-of-the-art policies target cache misses PCM High write latency Write Buffers insufficient High write energy Mobile, embedded devices ? Low cell endurance Limited write cycles ? Parameter DRAM PCM Row Read 210 mW 78 mW Row Write 195 mW 773 mW Activate 75 mW 25 mW Standby 90 mW 45 mW Refresh 4 mW 0 mW Initial Row Read 15 ns 28 ns Row Write 22 ns 150 ns Same Row R/W 15 ns 15 ns 0.3x 4x 0.3x 0.5x 7x 2x 0x

6. Outline Introduction Motivation ARI: Adaptive Replacement and Insertion Evaluation Summary Conclusion

7. Motivation PCM is attractive as a Main Memory, but... PCM does not favor writes High energy High latency Low write cycle tolerance Solution: reduce writes into Main Memory Modify LLC policies to reduce Writebacks Mind the Miss rate!

8. Application behavior in High-Associativity Caches Bi-Polar block distribution due to LRU policy 'Hot' blocks tend to group towards MRU side 'Cold' blocks towards LRU side in a set Hot blocks have higher Hit-ratio Cold blocks tend to have similar Hit-ratios %hitrate Position in LRU stackMRU LRU 'Hot' region 'Cold' region Hit distribution in a high-associativity cache (16-way)

9. Static LLC policies Based on the observed hot-cold distribution 16-way cache: 16 static policies, xH16 Replace any clean block in (16-x) Low-hit blocks Drawbacks: No single static policy good for all applications Less writebacks => more cache misses When replacing hot blocks

10. Enter ARI: Adaptive Replacement and Insertion Goal: Reduce LLC writebacks ! Keep miss rate lower than conventional policies How? Do not replace dirty cache blocks (as long as possible) Place fresh incoming blocks into LLC smartly Dynamically choose the best policy

11. ARI: Operation Evict clean blocks from Low-Hit region Insert new blocks into top of Low-Hit region %hitrate Position in LRU stackMRU LRU High-Hit region Low-Hit region

12. ARI: Operation Application hit-distributions are not static Dynamic policy adaptation based on epochs Emulate various static thresholds in LLC tags Pick the best one for next epoch (25k LLC accesses) Misses + Writebacks metric used %hitrate MRU LRU

13. Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$ ARI: Implementation Emulate static thresholds in shadow tags Adapt to the hit-distribution Tag Array Data ArrayShadow Tag Array dynamically 4H16 10H16 14H16

15. Methodology gem5 + DRAMSim2 simulators nVidia Tegra -like out-of-order, dual-issue CPU SPEC2006 and PARSEC suites Compared against state-of-the-art policies ARI beats them in writeback reduction Nearly identical in total performance System Single core Multicore L1 cache 32KB I + 64KB D, 2-way, LRU, 64B block 32KB I + 64KB D, 2-way, LRU, 64B block L2 cache 256KB, 8-way, LRU, 64B block 256KB, 8-way, LRU, 64B block (private) L3 cache 2MB, 16-way, LRU, 64B block 16MB, 16-way, LRU, 64B block (shared) Main memory 4GB, DDR3-1333 DRAM, 32-entry write buffer 4GB, DDR3-1333 DRAM, 32-entry write buffer

16. ARI: Writeback reduction ARI beats the competition: 33% WB reduction Writeback improvement, normalized to LRU policy DIP: M. Qureshi et al, ISCA '09 DBLK: S. Khan et al, MICRO '10 RRIP: A. Jaleel et al, ISCA '10

17. ARI: Miss reduction ARI achieves 4.7% Misses reduction Miss rate improvement, normalized to LRU policy DIP: M. Qureshi et al, ISCA '09 DBLK: S. Khan et al, MICRO '10 RRIP: A. Jaleel et al, ISCA '10

18. ARI: Performance improvement ARI yields a 5% IPC improvement on average IPC improvement, normalized to LRU policy

19. ARI: Dynamic behavior ARI adapts to program phases Achieves lower WBs than the best static policy Soplex application, SPEC 2006mcf application, SPEC 2006 Writebacks

20. ARI: Multicore applications

21. ARI: PCM lifetime improvement ARI facilitates the use of PCM as Main Memory DIP DBLK RRIP ARI 0% 10% 20% 30% 40% 50% 60% %PCMlifetimeimprovement Decrease lifetime for several apps

22. ARI: PCM lifetime improvement

23. ARI: Hardware overhead 8 sets shadowed per LLC bank (x8) p*2 shadow tags (we use p=9) 14kB storage overhead in a 16MB LLC Epoch counter 15 bits Performance counters, adders Not on critical path Can be designed for low power

25. ARI: Summary 33% writeback reduction 4.7% cache miss rate reduction 9% less Main Memory traffic System IPC boost of 5% Enabling PCM as Main Memory 50% lifetime improvement Win Win

26. Conclusion DRAM is hitting a scalability wall New memories/architectures proposed We target PCM as main memory Propose ARI: Adaptive Replacement and Insertion Simple scheme Reduce writebacks to main memory Boost the PCM performance and lifetime

27. Thank you! Questions?..

28. Backup Slides

29. Related Work: PCM G. Dhiman et al. PDRAM: A hybrid PRAM and DRAM main memory system. DAC 09 M. K. Qureshi et al. Enhancing Lifetime and Security of PCM-based Main Memory with Start-Gap Wear Leveling. MICRO 09 B. C. Lee et al. Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA 09 M. K. Qureshi et al. Scalable high performance main memory system using phase-change memory technology. ISCA 09 A. P. Ferreira et al. Increasing PCM main memory lifetime. DATE 10

30. Related Work: PCM N. H. Seong et al. Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. ISCA 10 H. Yoon et al. Row buffer locality aware caching policies for hybrid memories. ICCD 12 Stuecheli et al. The Virtual Write Queue: Coordinating DRAM and Last-Level Cache Policies. ISCA 10 M. K. Qureshi & G. H. Loh Fundamental latency trade-off in architecting dram caches: Outperforming impractical SRAM-tags with a simple and practical design. MICRO 12

31. ARI: Insertion impact

32. ARI: Total Memory Traffic gcc bzip bwaves mcf milc zeus gromacs cactusADMleslie3d namd gobmk soplex hmmer sjeng GemsFDTDh264ref astar sphinx3 avg 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 Total memory traffic, Misses + Writebacks. Normalized to LRU 4H16 ARI TotaltrafficnormalizedtoLRU

ARI. HiPEAK 2014

Technology

Transcript of ARI. HiPEAK 2014

ARI Secondary Team April 30, 2010 ARI-Secondary Team 2010.

Ari halo report 2014 final

Plato (Ari

SMX Israel 2014 - Ari Nahmani | Future Proof Link Earning: Automation, Acceleration, & Seeding | Kahena Digital Marketing

ARI-DP32 / DP33 / DP34 / DP34T / DP34Tri / DP35 · ARI-DP32 ARI-DP33 Pneumatic actuator Page 2 ARI-DP34 Pneumatic actuator Page 2 ARI-DP34T Pneumatic actuator Page 6 ... 3.4 x O-ring

ARI-SAFE - ARI Armaturen · 2 Edition 06/15 - Data subject to alteration - Regularly updated data on ! ARI-SAFE 901 / 902 / 911 / 912 Technical data ARI-SAFE-- Full lift safety valve

Ari Christiyanto

Ari Premio

ARI - Flou

ARI Crystal Gryphon SOP - Institute Of Molecular Biophysicsbiophysics.fsu.edu/soma/wp-content/uploads/sites/13/2014/09/ARIG… · ARI Crystal Gryphon Protocols . How to open, use,

Building Your First ARI App - Mark · PDF fileBuilding Your First ARI App ... • Astricon Talks • (Half of this year’s talks and 2014 still good) 39. 40. 41

ADEC Abu Mousa Al Asha Ari Private School 2014 2015

Ari Fadli_Laplace

000003-2.pdf - ARI · PDF fileEdition 0/5 - Data suect to alteration - Reularl udated data on ari-armaturencom ARI-DP30 Pneumatic actuator Page 2 ARI-DP32 ARI-DP33 Pneumatic actuator

SMX Israel 2014 - Ari Nahmani | Predictive Search: Apple Siri | Kahena Digital Marketing

ARI-PREMIO -Plus · Data sheet 000019 englisch (english) Electric thrust actuator ARI-PREMIO®-Plus 2,2 ... Technical data Type ARI-PREMIO-Plus 2,2 kN ARI-PREMIO-Plus 5 kN

Ariadne (Ari) Albright, MFA Artist in Healthcareclubrunner.blob.core.windows.net/00000009558/en-ca/files/...2014/07/22 · Ariadne (Ari) Albright, MFA Artist experience: • Program

Ari report

ARI-FABA -Plus - ARI Armaturen: Domů · Data sheet 040005 englisch (english) ARI-FABA®-Plus - Straight through with flanges ... 4 Edition 12/14 ... ARI-FABA®-Plus 046 Technical

Bluetooth Ari