An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...

Post on 17-Jan-2018

232 views 2 download

description

Outline Introduction Current ruby prefetching solution Our solution Implementation details Case of study Conclusions 3

Transcript of An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...

An Accurate and Detailed Prefetching

Simulation Framework for gem5

Martí Torrents, Raúl Martínez, and Carlos Molina

martit@ac.upc.eduComputer Architecture DepartmentUPC – BarcelonaTech

2

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

3

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

4

Prefetching

• Reduce memory latency• Bring to a nearest cache data required by CPU• Increase the hit ratio• Implemented in many commercial processors• Erroneous prefetching may produce slowdown• Simulation tools should include this capability

5

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

6

L1 Cache controller

Current ruby prefetching solution

Current Prefetcher

Prefetch queue

MESI protocolM E S I

L1 Private Cache Memory

7

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

8

Our solution

L1 Cache controller Current Prefetcher

Prefetch queue

MESI protocolM E S I

L1 Private Cache Memory

MOESI protocolM E S

I

O

Prefetch wrapper

Prefetch queue

Abstract Prefetcher

Specific PrefetchEngineTagged

Prefetcher

Global History Buffer

ReferencePrediction

Table

CurrentPrefetchEngine

Prefetch profiler

9

L2 Shared Cache Bank

Prefetch wrapper

L2 Cache controller

MOESI protocolM E S

IPrefetch queue

O

Prefetch profiler

Abstract Prefetcher

Specific PrefetchEngine

Our solution

L1 Private Cache Memory

Prefetch wrapper

L1 Cache controller

MOESI protocolM E S

IPrefetch queue

O

Prefetch profiler

Abstract Prefetcher

Specific PrefetchEngine

10

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

11

Implementation details

• Receives command line options• Creates and inits the prefetch engine• Manages communication Controller –> Prefetcher• And Prefetcher –> Prefetch Queue –> Controller• Collects statistics

Prefetch wrapper

12

Implementation details

• Accumulates the statistics

Prefetch profiler

• observed_misses - numMissesObserved

• Cancelled - numDroppedPrefetches

• completed - numPrefetchAccepted

• hit• in_cache • late –

numPartialHits/numHits• overflowed

• page_faults – numPagesCrossed

• total - numPrefetchRequested

• unuseful• useful• queue_merged_requests• generated_prefetches_p

er_train - streams• numMissedPrefetchBlock

s

MISS

13

Implementation details

• Collects the requests generated by the engine• Checks the page fault error• Merges repeated requests

Prefetch queue

14

Implementation details

• Works as an abstract class• Must be inherited by the specific pref• Virtual functions must be redeclared

– Init: Initialization function– Prefetcher size, distance, aggressiveness, etc.

– Observe request: Called on each cache access – Hit/miss, prefetch/no_prefetch, accessed address, etc.

– Allocate: Called when data allocated in cache– Same as observe request

– Deallocate: Called when evicting from cache– Only evicted address

Abstract Prefetcher

15

Implementation details

• Notifies the wrapper:– Cache accesses– Cache allocation– Cache evictions

• Reads from Prefetch Queue• Prefetch issued when no Loads/Stores• Protocol modification similar to a Load operation• Very similar to the current solution

L1 Cache controller

MOESI protocolM E S

I

O

16

Implementation details

• Same as in the L1 but…

• L2 local hits– Some data that is invalid in L2 but locally allocated

• L1_GETS does not store in L2– Protocol modified to store pref requests in L2

• Pref queue generates a request for another tile– Request is forwarded to the corresponding tile

L2 Cache controller

MOESI protocolM E S

I

O

17

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

18

Case of Study: NoC aware prefetch performance evaluation• We tested 3 classical prefetch engines:

– Tagged Prefetcher– Reference Prediction Table (RPT) – Global History Buffer (GHB)

• With the gem5 Simulator using – 16 tiled x86 CPUs – L1 prefetchers– Ruby memory system– MOESI coherency protocol– Garnet network simulator

• Parsecs 2.1

19

Case of Study: Results

M. Torrents, R. Martínez, C. Molina. “Network Aware Performance Evaluation of Prefetching Techniques in CMPs”. Simulation Modeling Practice and Theory (SIMPAT), 2014.

20

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

21

Conclusions

• Prefetcher is important and it must be simulated• Current solution is ok• Our solution goes one step farther

– Easy to change/add new prefetch engines– Detailed statistics about prefetching– Garnet can identify prefetching traffic– Useful for statistics or traffic manipulation

• Current tool can be easily included in new solution• Current solution is ok for non prefetch researchers • Our tool is better for research related with prefetch

An Accurate and Detailed Prefetching

Simulation Framework for gem5

Martí Torrents, Raúl Martínez, and Carlos Molina

martit@ac.upc.eduComputer Architecture DepartmentUPC – BarcelonaTech