Block size for caches

82
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Block size for caches Block size for caches 110 130 150 160 180 200 220 240 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Ld R1 M[ 1 ] Ld R2 M[ 5 ] Ld R3 M[ 1 ] Ld R3 M[ 4 ] Ld R2 M[ 0 ] Cache Processor tag data R0 R1 R2 R3 Memory 100 120 140 170 190 210 230 250 2 cache lines 2 byte block 3 bit tag field V V Block # 0 1 2 3 4 5 6 7

description

Block size for caches. Processor. Cache. Memory. Block #. 2 cache lines 2 byte block 3 bit tag field. 0 1 2 3 4 5 6 7. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. 100. 110. 120. Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] - PowerPoint PPT Presentation

Transcript of Block size for caches

Page 1: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

2 cache lines2 byte block3 bit tag field

V

V

Block #

0

1

2

3

4

5

6

7

Page 2: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

0

0

Page 3: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100110

110 Misses: 1

Hits: 0

lru

Addr: 0001 block offs

et

1

0

Page 4: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100110

110 Misses: 1

Hits: 0

lru

1

0

Page 5: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 0

lru

150140

150

Addr: 0101 block offs

et

1

1

Page 6: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 0

lru

150140

150

1

1

Page 7: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 1

lru150140

150110

1

1

Page 8: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 1

lru150140

150110

1

1

Page 9: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 2

lru

150140

150140

1

1

Page 10: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 2

lru

150140

150140

1

1

Page 11: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Block size for cachesBlock size for caches

110

130

150160

180

200

220

240

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]Ld R3 M[ 1 ]Ld R3 M[ 4 ]Ld R2 M[ 0 ]

CacheProcessor

0tag data

R0R1R2R3

Memory

2

100

120

140

170

190

210

230

250

100110

110 Misses: 2

Hits: 3

lru150140

140100140

1

1

Page 12: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Basic Cache organizationBasic Cache organization• Decide on the block size

– How? Simulate lots of different block sizes and see which one gives the best performance

– Most systems use a block size between 32 bytes and 128 bytes– Longer sizes reduce the overhead by:

• Reducing the number of bits in each TAG• Reducing the size of each TAG Array

– Very large sizes reduce the “usefulness” of the extra data• Spatial Locality – the closer it is, the more likely it will be used

TagBlockoffset

Address

Page 13: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Questions to ask about a cacheQuestions to ask about a cache

• What is the block size?• How many lines?• How many bytes of data storage?• How much overhead storage?• What is the hit rate?• What is the latency of an access?• What is the replacement policy ?

– LRU? LFU? FIFO? Random?

The Design Space is The Design Space is LargeLarge

Page 14: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

What about stores?What about stores?

• Where should you write the result of a store?– If that memory location is in the cache?

• Send it to the cache• Should we also send it to memory?

(write-through policy)– If it is not in the cache?

• Write it directly to memory without allocation? (write-around policy)• OR – Allocate the line (put it in the cache)?

(allocate-on-write policy)

Page 15: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Handling stores (write-through)Handling stores (write-through)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

V tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

Page 16: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 1)write-through (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

V tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

Page 17: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 1)write-through (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

lru

1

02978

29

Page 18: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 2)write-through (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

lru

1

02978

29

Page 19: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 2)write-through (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

lru 1

12978

29

162173

173

Page 20: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 3)write-through (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

lru 1

12978

29

162173

173

Page 21: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 3)write-through (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

3

120

71

173

21

28

200

225

Misses: 2

Hits: 1

lru

1

129

29

162173

173

173

173

Page 22: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 4)write-through (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

3

173

120

71

173

21

28

200

225

Misses: 2

Hits: 1

lru

1

129173

29

162173

173

Page 23: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 4)write-through (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

2

173

120

71

173

21

28

200

225

Misses: 3

Hits: 1

lru 1

129173

29173

1507129

29

Page 24: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 6)write-through (REF 6)

29

123

29162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V tag data

R0R1R2R3

Memory

2

173

120

71

173

21

28

200

225

Misses: 3

Hits: 1

lru 1

129173

29173

2971

Page 25: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-through (REF 6)write-through (REF 6)

29

123

29162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

5V tag data

R0R1R2R3

Memory

2

173

120

71

173

21

28

200

225

Misses: 4

Hits: 1

lru

1

1

29

2971

3328

33

Page 26: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

How many memory references?How many memory references?

• Every time we STORE, we go all the way to memory– Even if we hit in the cache!

caches generally miss < 10%

Page 27: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Write-throughWrite-through vs. vs. Write-backWrite-back

• Can we design the cache to NOT write all stores to memory immediately?– We can keep the most current copy JUST in the cache– If that data gets evicted from the cache, update memory

(a write-back policy)• We don’t want to lose the data!

– Do we need to write-back all evicted blocks?• No, only blocks that have been stored into

– Keep a “dirty bit”, reset when the block is allocated, set when the block is stored into. If a block is “dirty” when evicted, write its data back into memory.

Page 28: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Handling stores (write-back)Handling stores (write-back)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

Page 29: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 1)write-back (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

Page 30: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 1)write-back (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

01

0lru 29

78

29

Page 31: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 2)write-back (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

0

1

0lru 29

78

29

Page 32: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 2)write-back (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

0

0

1

1lr

u2978

29

162173

173

Page 33: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 3)write-back (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

0

0

1

1lr

u2978

29

162173

173

Page 34: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 3)write-back (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 1

1

0

1

1lru 29

173

29

162173

173

Page 35: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 4)write-back (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 2

Hits: 1

1

0

1

1lru 29

173

29

162173

173

Page 36: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 4)write-back (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 3

Hits: 1

1

1

1

1lr

u29173

29173

2971

Page 37: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 5)write-back (REF 5)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 3

Hits: 1

1

1

1

1lr

u29173

29173

2971

Page 38: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 5)write-back (REF 5)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

0V d tag data

R0R1R2R3

Memory

3

78

120

71

173

21

28

200

225

Misses: 4

Hits: 1

1

1

1

1lr

u29173

29173

2971

173

Page 39: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

write-back (REF 5)write-back (REF 5)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 7 ]St R2 M[ 0 ]St R1 M[ 5 ]Ld R2 M[ 10 ]

CacheProcessor

5V d tag data

R0R1R2R3

Memory

3

173

120

71

173

21

28

200

225

Misses: 4

Hits: 1

0

1

1

1lru

29

2971

3328

33

Page 40: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Where does write-back save us?Where does write-back save us?

• We write the data to memory eventually anyways – how is this better than write-through?

• If a value is written repeatedly, it only gets updated in the cache. It doesn’t have to store to memory every time!

– Think: loop counter, running sum, etc.

• Result: less total trips to memory, lower latency for stores

• If your data set fits in the cache – you can essentially skip going to memory beyond the initial load-up of program values!

Page 41: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

What about instructions?What about instructions?

• Instructions should be cached as well.• We have two choices:

1. Treat instruction fetches as normal data and allocate cache blocks when fetched.

2. Create a second cache (called the instruction cache or ICache) which caches instructions only.• What are advantages of a separate ICache?• Can anything go wrong with this?

Page 42: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Cache AssociativityCache Associativity

Balancing speed with capacity

Page 43: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

AssociativityAssociativity

• We designed a fully associative cache.– Any memory location can be copied to any cache block.– We check every cache tag to determine whether the data is in the cache.

• This approach is too slow for large caches– Parallel tag searches are slow and use a lot of power– OK for a few entries…but hundreds/thousands is not feasible

Page 44: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct mappedDirect mapped cache cache

• We can redesign the cache to eliminate the requirement for parallel tag lookups.

– Direct mapped caches partition memory into as many regions as there are cache lines

– Each memory block has a single cache line in which data can be placed.– You then only need to check a single tag – the one associated with the region

the reference is located in.

• Think: Modulus Hash Function

Page 45: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Mapping memory to cacheMapping memory to cache

29

123

150162

18

33

19

210

0123456789

101112131415

tag data

78

120

71

173

21

28

200

225

tag line index block offset

Address:

0

1

2

3

1 bit2 bits1 bit

Page 46: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped cacheDirect-mapped cache

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

LRU

Page 47: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 1)Direct-mapped (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

Page 48: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 1)Direct-mapped (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

01

0

29

Page 49: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 2)Direct-mapped (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

01

0

29

Page 50: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 2)Direct-mapped (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

01

0

29150

Page 51: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 3)Direct-mapped (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

01

0

29150

Page 52: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 3)Direct-mapped (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

0 150123

78

120

71

173

21

28

200

225

Misses: 3

Hits: 0

0

1

1

1

29150

Page 53: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 4)Direct-mapped (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

0 150123

78

120

71

173

21

28

200

225

Misses: 3

Hits: 0

0

1

1

1

29150

Page 54: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 4)Direct-mapped (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

0 150123

78

150

71

173

21

28

200

225

Misses: 4

Hits: 0

0

1

1

1

29150

Page 55: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 4)Direct-mapped (REF 4)

29

150

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

1 16229

78

120

71

173

21

28

200

225

Misses: 4

Hits: 0

0

1

1

1

29150

Page 56: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 5)Direct-mapped (REF 5)

29

150

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

1 16229

78

120

71

173

21

28

200

225

Misses: 4

Hits: 0

0

1

1

1

29150

Page 57: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Direct-mapped (REF 5)Direct-mapped (REF 5)

29

150

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 2 ]St R1 M[ 7 ]Ld R2 M[ 4 ]

CacheProcessor

1 71150

V d tag data

R0R1R2R3

Memory

1 16229

78

120

71

173

21

28

200

225

Misses: 4

Hits: 1

0

1

1

1

2971

Page 58: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Split the differenceSplit the difference

• Direct mapped costs us in performance– Certain memory access patterns can turn out poorly

• Set associative caches:– Partition memory into regions

• like direct mapped but fewer partitions– Associate a region to a set of cache blocks

• Check tags for all blocks in a set to determine a HIT

• Treat each set like a small fully associative cache.– LRU (or LRU-like) policy generally used.

Page 59: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set Associative CacheSet Associative Cache

29

123

150162

18

33

19

210

0123456789

101112131415

tag data78

120

71

173

21

28

200

225

tag set index block offset

Address:

0

1

1 bit1 bits2 bit

Page 60: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set Associative CacheSet Associative Cacheusing the book’s styleusing the book’s style

29

123

150162

18

33

19

210

0123456789

101112131415

tag data78

120

71

173

21

28

200

225

tag set index block offset

Address:

1 bit1 bits2 bit

tag data

Way 1 Way 2

01

Page 61: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache exampleSet-associative cache example

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

0

0

Page 62: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 1)Set-associative cache (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 0

Hits: 0

0

0

0

0

Page 63: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 1)Set-associative cache (REF 1)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

01

0

0

0

29

lru

Page 64: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 2)Set-associative cache (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

78

120

71

173

21

28

200

225

Misses: 1

Hits: 0

01

0

0

0

29

lru

Page 65: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 2)Set-associative cache (REF 2)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 71150

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

0

0

1

1

0

0

29

lru

150

Page 66: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 3)Set-associative cache (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 71150

78

120

71

173

21

28

200

225

Misses: 2

Hits: 0

0

0

1

1

0

0

29

lru

150

Page 67: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 3)Set-associative cache (REF 3)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 71150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 0

0

0

1

1

1 162150

11

0

29

lru

150

lru

Page 68: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 4)Set-associative cache (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 71150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 0

0

0

1

1

1 162150

11

0

29

lru

150

lru

Page 69: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 4)Set-associative cache (REF 4)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 29150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 1

0

1

1

1

1 162150

11

0

29

lru

150

lru

Page 70: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 5)Set-associative cache (REF 5)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 29150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 1

0

1

1

1

1 162150

11

0

29

lru

150

lru

Page 71: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 5)Set-associative cache (REF 5)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 29150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 2

0

1

1

1

1 162150

11

0

29

lru

150

lru

78

Page 72: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 6)Set-associative cache (REF 6)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 29150

78

120

71

173

21

28

200

225

Misses: 3

Hits: 2

0

1

1

1

1 162150

11

0

29

lru

150

lru

78

Page 73: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 6)Set-associative cache (REF 6)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

1 29150

78

120

29

173

21

28

200

225

Misses: 3

Hits: 2

0

1

1

1

1 162150

11

0

29

lru

150

lru

78

Page 74: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Set-associative cache (REF 6)Set-associative cache (REF 6)

29

123

150162

18

33

19

210

0123456789

101112131415

Ld R1 M[ 1 ]Ld R2 M[ 5 ]St R2 M[ 7 ]St R1 M[ 4 ]Ld R3 M[ 0 ]Ld R2 M[ 8 ]

CacheProcessor

0 7829

V d tag data

R0R1R2R3

Memory

2 1821

78

120

29

173

21

28

200

225

Misses: 4

Hits: 2

0

0

1

1

1 162150

11

0

29

lru

18

lru

78

Page 75: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Reasons for cache missesReasons for cache misses• First reference to an address

– Compulsory miss• Reduce by increasing block size• or pre-fetching

• Cache is too small to hold all the data– Capacity miss

• Reduce misses by building a bigger cache

• Replaced it from a busy set– Conflict miss

• Reduce by increasing associativity

Page 76: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Sample Cache hit ratesSample Cache hit rates

010

20

30

8K byte 16 K byte 32 K byte 64 K byte

Direct mapped

4-way set associative

Fully Associative

Cac

he

mis

s ra

te

Cache Size (block data only)

Page 77: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Itanium-2 On-chip Caches (Original)Itanium-2 On-chip Caches (Original)• L1, 16KB, 4-way s.a., 64B line

– quad-port (2 load+2 store)– L1D for data, L1I for instructions– 1 cycle latency

• L2, 256KB, 4-way s.a, 128B line– quad-port (4 load or 4 store)– 5 cycle latency

• L3, 3MB, 12-way s.a., 128B line– single 32B port– 12 cycle latency

Page 78: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Itanium-2 On-chip Caches (More Recent)Itanium-2 On-chip Caches (More Recent)• L1, 16KB, 4-way s.a., 64B line

– quad-port (2 load+2 store)– L1D for data, L1I for instructions– 2 cycle latency

• L2, 96KB, 4-way s.a, 128B line– quad-port (4 load or 4 store)– 9 cycle latency

• L3, 4MB, 12-way s.a., 128B line– single 32B port– 24 cycle latency

Page 79: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)• Intel IA-32 (x86) instructions are CISC (Complex)

– They can take many cycles to decode because of complexity and variable length

• Current (and recent) Intel chips have adopted a “RISC-like” organization– Different Interior Instruction Set (using “micro-ops”)– Exterior Instruction Set remains the same (for compatability)

• Need sophisticated (and slow) hardware to translate between the two instruction sets

• Intel introduced their Trace Cache to avoid having to repeatedly do this translation

Page 80: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)• The first time an instruction enters the processor, it is decoded into its

respective microinstructions– SAVE the string of micro-ops in a cache!– String together multiple micro-ops in a sequential order called a trace

• Break up traces by “basic blocks”

• If that instruction is accessed again, just grab the micro-ops directly from the trace cache.

• The trace cache operates in much the same way as a L1 Instruction cache– There is a bigger penalty for missing, since you have to:

• Load from L2 cache• Decode into micro-ops

Page 81: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Specializing Caches (Intel Pentium 4 Trace Cache)Specializing Caches (Intel Pentium 4 Trace Cache)Trace Cache:• 12k ops• 8 way s.a.• 256 lines• “block size” is

6 ops

~ 80 KBytes

Page 82: Block size for caches

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Pitfall: How you access a 2-D arrayPitfall: How you access a 2-D arrayIn C/C++:

int bigArray[100][16];

How do we map this to a 1-D “storage array” (AKA memory?)C/C++ uses row-major order – store the first row, then the second, etc.

When accessing a row, how does a cache do?When accessing a column, how does a cache do?