CPUs
description
Transcript of CPUs
![Page 1: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/1.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
CPUs
Caches.Memory management.
![Page 2: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/2.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Caches and CPUs
CPUca
che
cont
roll
er
cache
mainmemory
data
data
address
data
address
![Page 3: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/3.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Cache operation
Many main memory locations are mapped onto one cache entry.
May have caches for: instructions; data; data + instructions (unified).
Memory access time is no longer deterministic.
![Page 4: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/4.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Terms
Cache hit: required location is in cache.
Cache miss: required location is not in cache.
Working set: set of locations used by program in a time interval.
![Page 5: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/5.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Types of misses
Compulsory (cold): location has never been accessed.
Capacity: working set is too large.Conflict: multiple locations in
working set map to same cache entry.
![Page 6: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/6.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Memory system performance
h = cache hit rate.tcache = cache access time, tmain =
main memory access time.Average memory access time:
tav = htcache + (1-h)tmain
![Page 7: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/7.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Multiple levels of cache
CPU L1 cache L2 cache
![Page 8: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/8.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Multi-level cache access time
h1 = cache hit rate.
h2 = hit rate on L2.Average memory access time:
tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain
![Page 9: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/9.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Replacement policies
Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location.
Two popular strategies: Random. Least-recently used (LRU).
![Page 10: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/10.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Cache organizations
Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented).
Direct-mapped: each memory location maps onto exactly one cache entry.
N-way set-associative: each memory location can go into one of n sets.
![Page 11: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/11.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Cache performance benefits
Keep frequently-accessed locations in fast cache.
Cache retrieves more than one word at a time. Sequential accesses are faster after first
access.
![Page 12: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/12.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Direct-mapped cache
valid
=
tag index offset
hit value
tag data
1 0xabcd byte byte byte ...
byte
cache block
![Page 13: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/13.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Write operations
Write-through: immediately copy write to main memory.
Write-back: write to main memory only when location is removed from cache.
![Page 14: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/14.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Direct-mapped cache locations
Many locations map onto the same cache block.
Conflict misses are easy to generate: Array a[] uses locations 0, 1, 2, … Array b[] uses locations 1024, 1025,
1026, … Operation a[i] + b[i] generates conflict
misses.
![Page 15: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/15.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Set-associative cache
A set of direct-mapped caches:
Set 1 Set 2 Set n...
hit data
![Page 16: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/16.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Example: direct-mapped vs. set-associative
address data000 0101001 1111010 0000011 0110100 1000101 0001110 1010111 0100
![Page 17: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/17.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Direct-mapped cache behavior
After 001 access:block tag data00 - -01 0
111110 - -11 - -
After 010 access:block tag data00 - -01 0
111110 0
000011 - -
![Page 18: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/18.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Direct-mapped cache behavior, cont’d.
After 011 access:block tag data00 - -01 0
111110 0
000011 0
0110
After 100 access:block tag data00 1
100001 0
111110 0
000011 0
0110
![Page 19: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/19.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Direct-mapped cache behavior, cont’d.
After 101 access:block tag data00 1
100001 1
000110 0
000011 0
0110
After 111 access:block tag data00 1
100001 1
000110 0
000011 1
0100
![Page 20: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/20.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
2-way set-associtive cache behavior
Final state of cache (twice as big as direct-mapped):set blk 0 tag blk 0 data blk 1 tag blk 1
data00 1 1000 - -01 0 1111 1 000110 0 0000 - -11 0 0110 1 0100
![Page 21: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/21.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
2-way set-associative cache behavior
Final state of cache (same size as direct-mapped):set blk 0 tag blk 0 data blk 1 tag blk 1
data0 01 0000 10 10001 10 0111 11 0100
![Page 22: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/22.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Example caches
StrongARM: 16 Kbyte, 32-way, 32-byte block
instruction cache. 16 Kbyte, 32-way, 32-byte block data
cache (write-back).C55x:
Various models have 16KB, 24KB cache. Can be used as scratch pad memory.
![Page 23: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/23.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Scratch pad memories
Alternative to cache: Software determines what is stored in
scratch pad.Provides predictable behavior at the
cost of software control.C55x cache can be configured as
scratch pad.
![Page 24: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/24.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Memory management units
Memory management unit (MMU) translates addresses:
CPUmain
memory
memorymanagement
unit
logicaladdress
physicaladdress
![Page 25: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/25.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Memory management tasks
Allows programs to move in physical memory during execution.
Allows virtual memory: memory images kept in secondary storage; images returned to main memory on
demand during execution.Page fault: request for location not
resident in memory.
![Page 26: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/26.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Address translation
Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses.
Two basic schemes: segmented; paged.
Segmentation and paging can be combined (x86).
![Page 27: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/27.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Segments and pages
memory
segment 1
segment 2
page 1page 2
![Page 28: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/28.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Segment address translation
segment base address logical address
rangecheck
physical address
+
rangeerror
segment lower boundsegment upper bound
![Page 29: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/29.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Page address translation
page offset
page offset
page i base
concatenate
![Page 30: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/30.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Page table organizations
flat tree
page descriptor
pagedescriptor
![Page 31: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/31.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
Caching address translations
Large translation tables require main memory access.
TLB: cache for address translation. Typically small.
![Page 32: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/32.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
ARM memory management
Memory region types: section: 1 Mbyte block; large page: 64 kbytes; small page: 4 kbytes.
An address is marked as section-mapped or page-mapped.
Two-level translation scheme.
![Page 33: CPUs](https://reader036.fdocuments.us/reader036/viewer/2022062520/568159aa550346895dc70e2d/html5/thumbnails/33.jpg)
© 2008 Wayne WolfOverheads for Computers as
Components 2nd ed.
ARM address translation
offset1st index 2nd index
physical address
Translation tablebase register
1st level tabledescriptor
2nd level tabledescriptor
concatenate
concatenate