Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.

Increasing the Cache Efficiency by Eliminating Noise

Philip A. Marshall

Outline

Background Motivation for Noise Prediction Concepts of Noise Prediction Implementation of Noise Prediction Related Work Prefetching Data Profiling Conclusion

Background

Cache FetchOn Cache MissPrefetch

Exploiting Spatial LocalityCache words are fetched in blocksFetch neighboring block(s) on a cache missResults in fewer cache missesFetches words that aren’t needed

Background

Cache noiseWords that are fetched into the cache but

never used Cache utilization

The fraction of words in the cache that are used

Represents how efficiently the cache is used

Motivation for Noise Prediction

Level 1 data cache utilization is ~57% for SPEC2K benchmarks [2]

Fetching unused words: Increases bandwidth requirements between

cache levels Increases hardware and power requirementsWastes valuable cache space

[2] D. Burger et. al., Memory bandwidth limitations of future microprocessors, Proc. ISCA-23, 1996


Cache block size Larger blocks

Exploit spatial locality better Reduce cache tag overhead Increase bandwidth requirements

Smaller blocks Reduced cache noise

Any block size results in suboptimal performance


Sub-blockingOnly portions of the cache blocks are fetchedDecreases tag overhead by associating one

tag with many sub-blocksWords fetched must be in contiguous blocks

of fixed sizeHigh miss-rate and cache noise for non-

contiguous access patterns


By predicting which words will actually be used, cache noise can be reduced

But:Fetching fewer words could increase the

number of cache misses

Concepts of Noise Prediction

Selective fetchingFor each block, fetch only the words that are

predicted to be accessed If no prediction is available, fetch the entire

blockUses a valid bit for each word and a words

usage bit to track which words have been used


Cache Noise PredictorsPhase Context Predictor (PCP)

Based on the usage pattern of the most recently evicted block

Memory Context Predictor (MCP) Based on the MSBs of the memory address

Code Context Predictor (CCP) Based on the MSBs of the PC


Prediction table sizeLarger tables decrease the probability of “no

predictions”Smaller tables use less power

A prediction is considered successful if all the needed words are fetched If extra words are fetched, still considered a

success


Improving PredictionMiss Initiator Based History (MIBH)

Keep separate histories according to which word in the block caused the miss

Improves predictability if relative position of words accessed is fixed

Example: looping through a struct and accessing only one field


Improving PredictionOR-ing Previous Two Histories (OPTH)

Increases predictability by looking at more than the most recent access

Reduces cache utilization OR-ing more than two accesses reduces utilization

substantially

Results

Empirically, CCP provides the best results MIBH greatly increases predictability OPTH improves predictability only

marginally while increasing cache noise Cache utilization increased from 57% to

92%

Results

Related Work

Existing work focuses reducing cache misses, not on improving utilization

Sub-blocked caches used mainly to decrease tag overhead

Some existing work on prediction of which sub-blocks to load in a sub-blocked cache

No existing techniques for predicting and fetching non-contiguous words

Related Work

Prefetching

Prefetching improves the cache miss rate Commonly, prefetching is implemented by

also fetching the next block on a cache miss

Prefetching increases noise and increases bandwidth requirements

Prefetching

Noise prediction leads to more intelligent prefetching but requires extra hardware

On average, prefetching with noise prediction leads to less energy consumption

In the worst case, energy requirements increase

Prefetching

Data Profiling

For some benchmarks there are a low number of predictions

The predictor table is too small to hold all the word usage histories

Don’t increasing table size, profile the data Profiling increases prediction rate by ~7% Gains aren’t as high as expected

Data Profiling

Analysis of Noise Prediction

ProsSmall increase in miss rate (0.1%)Decreased power requirements in most casesDecreased bandwidth requirements between

cache levelsAdapts effective block size to access patternsDynamic technique but profiling can be usedScaleable to different predictor sizes

Analysis of Noise Prediction

Cons Increased hardware overhead Increases power in the worst caseNot all programs benefitProfiling provides limited improvement

Other Thoughts

How were benchmarks chosen? 6 of 12 integer and 8 of 14 floating point SPEC2K

benchmarks were used Not all predictors were examined equally

22-bit MCP predictor performed slightly poorer than a 28-bit CCP

28-bit MCP? How can the efficiency of the prediction table be

increased?

Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.

Documents

Transcript of Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.