Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

27
1 Energy-efficiency potential of a phase- based cache resizing scheme for embedded systems G. Pokam and F. Bodin

description

Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems. G. Pokam and F. Bodin. Motivation (1/3). High-performance accommodates difficultly with low-power Consider the cache hierarchy for instance benefits of large caches - PowerPoint PPT Presentation

Transcript of Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

Page 1: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

1

Energy-efficiency potential of a phase-based cache resizing

scheme for embedded systems

G. Pokam and F. Bodin

Page 2: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

3

Motivation (1/3) High-performance accommodates

difficultly with low-power

Consider the cache hierarchy for instance benefits of large cachesbenefits of large caches

maintain embedded code + data workload on-chip reduce off-chip memory traffic

however,however, caches account for ~80% of the transistors count we usually devote half of the chip area to caches

Page 3: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

4

Motivation (2/3) Cache impact on the energy consumptionCache impact on the energy consumption

static energy is incommensurate in comparison to the rest of the chip

80% of the transistors contribute steadily to the leakage power

dynamic energy (transistors switching activities) represents an important fraction of the total energy due to the high access frequency of caches

Caches design is therefore critical in the context of high-performance embedded systems

Page 4: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

5

Motivation (3/3)

We seek to address cache energy management via

Hardware/software interactionHardware/software interaction

Any good ways to achieve that ? Yes: add flexibility to allow a cache to be

reconfigured efficiently

How ? Follow program phases to adapt the cache Follow program phases to adapt the cache

structure accordinglystructure accordingly

Page 5: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

6

Previous work (1/2)

Some configurable cache proposals that apply to embedded systems include:

Albonesi [MICRO’99]: selective cache waysAlbonesi [MICRO’99]: selective cache ways to disable/enable individual cache ways of a highly set-associative cache

Zhang & al. [ISCA’03]: way-concatenationZhang & al. [ISCA’03]: way-concatenation to reduce the cache associativity while still maintaining the full cache capacity

Page 6: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

7

Previous work (2/2) These approaches only consider

configuration on a per-application basisper-application basis

Problems : empirically, no best cache size exists for a given applicationempirically, no best cache size exists for a given application varying dynamic cache behavior within an application, and varying dynamic cache behavior within an application, and

from one application to anotherfrom one application to another

Therefore, these approaches do not accommodate Therefore, these approaches do not accommodate well to program phase changeswell to program phase changes

Page 7: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

8

Our approach Objective :

emphasize on application-specific cache architectural emphasize on application-specific cache architectural parametersparameters

To do so, we consider a cache with fixed line size and modulus set mapping function

power/perf is dictated by size and associativitypower/perf is dictated by size and associativity Not all dynamic program phases may have the

same requirements on cache size and associativity !

Dynamically varying size and assoc. to leverage Dynamically varying size and assoc. to leverage power/perf. tradeoff at phase-levelpower/perf. tradeoff at phase-level

Page 8: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

9

Cache model (1/8)

Baseline cache model: way-concatenation cache [Zhang ISCA’03][Zhang ISCA’03]

Functionality of the way-concatenation cache on each cache lookup, a logic selects the number of active cache ways mm out of the nn available cache ways

virtually, each active cache way is a multiple of the size of a single bank in the base n-way cache.n-way cache.

Page 9: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

10

Cache model (2/8) Our proposal:

modify the associativity while guaranteeing cache coherency

modify the cache size while preserving data availability on unused cache portions

Page 10: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

11

Cache model (3/8)

First enhancement: associativity levelFirst enhancement: associativity level Problem with baseline modelProblem with baseline model

consider the following scenario in the baseline model

@A

Bank 0 Bank 1 Bank 2 Bank 3

Phase 0: 32K 2-way, active banks are 0 and 2

Phase 1: 32K 1-way, active bank is 2, @A is modified

Old copy

@A @A

invalidation

Page 11: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

12

Cache model (4/8)

Proposed solutionProposed solution : assume a write-through cache

the unused tag and status arrays must be made accessible on a write to ensure coherency across cache configurations => associative tag arrayassociative tag array

actions of the cache controller: access all tag arrays on a write request to set the corresponding status bit to invalid

Page 12: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

13

Cache model (5/8) Second enhancement: cache size levelSecond enhancement: cache size level

Problem with the baseline modelProblem with the baseline model: Gated-Vdd is used to disconnect a bank => data are not preserved across 2 configurations!

Proposed solutionProposed solution: unused cache ways are put in a low-power mode => drowsy mode [Flautner & al. ISCA’02] tag portion is left unchanged ! Main advantage

we can reduce the cache size, preserve the state of the preserve the state of the unused memory cellsunused memory cells across program phases, while still reducing leakage energy !

Page 13: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

14

Cache model (6/8)

Overall cache model

Page 14: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

16

Cache model (8/8) Drowsy circuitry accounts for less than 3% of the

chip area Accessing a line in drowsy mode requires 1 cycle

delay [Flautner & al. ISCA’02] ISA extension

we assume the ISA can be extended with a reconfiguration instruction having the following effects on WCR:

way-mask drowsy bit config0 0/1 32K1W/8K1W1 0/1 32K2W/16K1W2 0/1 32K2W/16K2W3 0 32K4W

Page 15: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

17

Trace-based analysis (1/3)

Goal : We want to extract a performance and energy

profiles from the trace in order to adapt the cache structure to the dynamic application requirements

Assumptions : LRU replacement policy no prefetching

Page 16: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

18

Trace-based analysis (2/3) sample interval = set mapping function = (for varying the associativity)

LRU-Stack distance d = (for varying the cache size)

Then, define the LRU-stack profiles :

: performance

for each pair , this expression defines the number of dynamic references that hit in caches with LRU-stack distance

i

))(( xmapP ji

jmapx

),( xmap j

xd

Page 17: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

19

Trace-based analysis (3/3) : energy))(( xmapE ji

cachejj ExmapPxmapEii

*))(())((

Tagi E*

drowsyENi*

memoryEWritei*

cachememory EratioE *

Cache energy

Tag energy

Drowsy transitions

energy

memory energy

Page 18: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

20

Experimental setup (1/2) Focus on data cache Simulation platform

4-issue VLIW processor [Faraboschi & al. ISCA’00]

32KB 4-way data cache 32B block size 20 cycles miss penalty

Benchmarks MiBench: fft, gsm, susan MediaBench: mpeg, epic PowerStone: summin, whestone, v42bis

Page 19: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

21

Experimental setup (2/2) CACTI 3.0

to obtain energy values we extend it to provide leakage energy values for

each simulated cache configuration

Hotleakage from where we adapted the leakage energy

calculation for each simulated leakage reduction technique

estimated memory ratio = 50 drowsy energy from [Flautner & al. ISCA’02]

Page 20: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

22

Program behavior (1/4)

GSM All 32K config All 16K

config

8K config

Capacity miss effect

Tradeoff region

Sensitive

region

Insensitive

region

(log10 scale)

(log10 scale)

K100

Page 21: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

23

Program behavior (2/4)

FFT

Page 22: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

24

Program behavior (3/4) Working set size sensitivity propertyWorking set size sensitivity property

the working set can be partitioned into clusters with similar cache sensitivity

Capturing sensitivity through working set Capturing sensitivity through working set size clusteringsize clustering

the partitioning is done relative to the base cache configuration

We use a simple metric based on the Manhattan distance vector from two points and

1kiv

2kiv

12 ki

ki vv

Page 23: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

25

Program behavior (4/4)

More energy/performance profiles

summin whestone

Page 24: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

26

Results (1/3)

Dynamic energy reduction

Page 25: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

27

Results (2/3)

Leakage energy savings (0.07um)

Better due to gated-Vdd

Page 26: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

28

Results (3/3)

PerformanceWorst-case degradation (65% due to drowsy transitions)

Page 27: Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems

29

Conclusions and future work

Can do better for improving performance reduce the frequency of drowsy transitions

within a phase with refined cache bank access policies

management of reconfiguration at the compiler level

insert BB annotation in the trace exploit feedback-directed compilation

promising scheme for embedded systems