An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...
-
Upload
joanna-tucker -
Category
Documents
-
view
232 -
download
2
description
Transcript of An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...
![Page 1: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/1.jpg)
An Accurate and Detailed Prefetching
Simulation Framework for gem5
Martí Torrents, Raúl Martínez, and Carlos Molina
[email protected] Architecture DepartmentUPC – BarcelonaTech
![Page 2: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/2.jpg)
2
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 3: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/3.jpg)
3
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 4: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/4.jpg)
4
Prefetching
• Reduce memory latency• Bring to a nearest cache data required by CPU• Increase the hit ratio• Implemented in many commercial processors• Erroneous prefetching may produce slowdown• Simulation tools should include this capability
![Page 5: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/5.jpg)
5
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 6: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/6.jpg)
6
L1 Cache controller
Current ruby prefetching solution
Current Prefetcher
Prefetch queue
MESI protocolM E S I
L1 Private Cache Memory
![Page 7: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/7.jpg)
7
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 8: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/8.jpg)
8
Our solution
L1 Cache controller Current Prefetcher
Prefetch queue
MESI protocolM E S I
L1 Private Cache Memory
MOESI protocolM E S
I
O
Prefetch wrapper
Prefetch queue
Abstract Prefetcher
Specific PrefetchEngineTagged
Prefetcher
Global History Buffer
ReferencePrediction
Table
CurrentPrefetchEngine
Prefetch profiler
![Page 9: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/9.jpg)
9
L2 Shared Cache Bank
Prefetch wrapper
L2 Cache controller
MOESI protocolM E S
IPrefetch queue
O
Prefetch profiler
Abstract Prefetcher
Specific PrefetchEngine
Our solution
L1 Private Cache Memory
Prefetch wrapper
L1 Cache controller
MOESI protocolM E S
IPrefetch queue
O
Prefetch profiler
Abstract Prefetcher
Specific PrefetchEngine
![Page 10: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/10.jpg)
10
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 11: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/11.jpg)
11
Implementation details
• Receives command line options• Creates and inits the prefetch engine• Manages communication Controller –> Prefetcher• And Prefetcher –> Prefetch Queue –> Controller• Collects statistics
Prefetch wrapper
![Page 12: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/12.jpg)
12
Implementation details
• Accumulates the statistics
Prefetch profiler
• observed_misses - numMissesObserved
• Cancelled - numDroppedPrefetches
• completed - numPrefetchAccepted
• hit• in_cache • late –
numPartialHits/numHits• overflowed
• page_faults – numPagesCrossed
• total - numPrefetchRequested
• unuseful• useful• queue_merged_requests• generated_prefetches_p
er_train - streams• numMissedPrefetchBlock
s
MISS
![Page 13: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/13.jpg)
13
Implementation details
• Collects the requests generated by the engine• Checks the page fault error• Merges repeated requests
Prefetch queue
![Page 14: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/14.jpg)
14
Implementation details
• Works as an abstract class• Must be inherited by the specific pref• Virtual functions must be redeclared
– Init: Initialization function– Prefetcher size, distance, aggressiveness, etc.
– Observe request: Called on each cache access – Hit/miss, prefetch/no_prefetch, accessed address, etc.
– Allocate: Called when data allocated in cache– Same as observe request
– Deallocate: Called when evicting from cache– Only evicted address
Abstract Prefetcher
![Page 15: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/15.jpg)
15
Implementation details
• Notifies the wrapper:– Cache accesses– Cache allocation– Cache evictions
• Reads from Prefetch Queue• Prefetch issued when no Loads/Stores• Protocol modification similar to a Load operation• Very similar to the current solution
L1 Cache controller
MOESI protocolM E S
I
O
![Page 16: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/16.jpg)
16
Implementation details
• Same as in the L1 but…
• L2 local hits– Some data that is invalid in L2 but locally allocated
• L1_GETS does not store in L2– Protocol modified to store pref requests in L2
• Pref queue generates a request for another tile– Request is forwarded to the corresponding tile
L2 Cache controller
MOESI protocolM E S
I
O
![Page 17: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/17.jpg)
17
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 18: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/18.jpg)
18
Case of Study: NoC aware prefetch performance evaluation• We tested 3 classical prefetch engines:
– Tagged Prefetcher– Reference Prediction Table (RPT) – Global History Buffer (GHB)
• With the gem5 Simulator using – 16 tiled x86 CPUs – L1 prefetchers– Ruby memory system– MOESI coherency protocol– Garnet network simulator
• Parsecs 2.1
![Page 19: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/19.jpg)
19
Case of Study: Results
M. Torrents, R. Martínez, C. Molina. “Network Aware Performance Evaluation of Prefetching Techniques in CMPs”. Simulation Modeling Practice and Theory (SIMPAT), 2014.
![Page 20: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/20.jpg)
20
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
![Page 21: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/21.jpg)
21
Conclusions
• Prefetcher is important and it must be simulated• Current solution is ok• Our solution goes one step farther
– Easy to change/add new prefetch engines– Detailed statistics about prefetching– Garnet can identify prefetching traffic– Useful for statistics or traffic manipulation
• Current tool can be easily included in new solution• Current solution is ok for non prefetch researchers • Our tool is better for research related with prefetch
![Page 22: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.](https://reader035.fdocuments.us/reader035/viewer/2022062223/5a4d1b0d7f8b9ab05998ca0d/html5/thumbnails/22.jpg)
An Accurate and Detailed Prefetching
Simulation Framework for gem5
Martí Torrents, Raúl Martínez, and Carlos Molina
[email protected] Architecture DepartmentUPC – BarcelonaTech