Memory Hierarchy

79
1 Memory Hierarchy

description

Memory Hierarchy. Outline. Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Locality Memory hierarchy Suggested Reading: 6.1, 6.2, 6.3. Nonvolatile: 非易失的. 6.1 Storage Technologies. 6.1.1 Random-Access Memory. Random-Access Memory (RAM). Key features - PowerPoint PPT Presentation

Transcript of Memory Hierarchy

Page 1: Memory Hierarchy

1

Memory Hierarchy

Page 2: Memory Hierarchy

2

Outline

• Random-Access Memory (RAM)

• Nonvolatile Memory

• Disk Storage• Locality• Memory hierarchy

• Suggested Reading: 6.1, 6.2, 6.3

Nonvolatile: 非易失的

Page 3: Memory Hierarchy

3

6.1 Storage Technologies

Page 4: Memory Hierarchy

4

6.1.1 Random-Access Memory

Page 5: Memory Hierarchy

5

Random-Access Memory (RAM)

• Key features

– RAM is packaged as a chip.

– Basic storage unit is a cell (one bit per cell).

– Multiple RAM chips form a memory.

Page 6: Memory Hierarchy

6

Random-Access Memory (RAM)

• Static RAM (SRAM)– Each cell stores bit with a six-transistor circuit.

– Retains value indefinitely, as long as it is kept powered.

– Relatively insensitive to disturbances such as electrical noise.

– Faster and more expensive than DRAM.

Page 7: Memory Hierarchy

7

Random-Access Memory (RAM)

• Dynamic RAM (DRAM)

– Each cell stores bit with a capacitor and

transistor.

– Value must be refreshed every 10-100 ms.

– Sensitive to disturbances.

– Slower and cheaper than SRAM.

Page 8: Memory Hierarchy

8

SRAM vs DRAM summary

Tran. Accessper bit time Persist?Sensitive? Cost Applications

SRAM 6 1X Yes No 100x cache memoriesDRAM 1 10X No Yes 1X Main memories,

frame buffers

Figure 6.2 P458

Page 9: Memory Hierarchy

9

Conventional DRAM organization

• d x w DRAM:– dw total bits organized as d supercells of size w bits

cols

rows

0 1 2 3

0

1

2

3

internal row buffer

16 x 8 DRAM chip

addr

data

supercell(2,1)

2 bits/

8 bits/

memorycontroller

(to CPU)

Figure 6.3 P459

Page 10: Memory Hierarchy

10

Reading DRAM supercell (2,1)

• Step 1(a): Row access strobe (RAS) selects row 2.• Step 1(b): Row 2 copied from DRAM array to row buffer.

RAS = 2

cols

rows

0 1 2 3

0

1

2

3

internal row buffer

16 x 8 DRAM chip

row 2

addr

data

2/

8/

memorycontroller

Figure 6.4 (a) P460

Page 11: Memory Hierarchy

11

Reading DRAM supercell (2,1)

• Step 2(a): Column access strobe (CAS) selects column 1.• Step 2(b): Supercell (2,1) copied from buffer to data lines, and

eventually back to the CPU.

supercell (2,1)

cols

rows

0 1 2 3

0

1

2

3

internal row buffer

16 x 8 DRAM chip

CAS = 1

addr

data

2/

8/

memorycontroller

Figure 6.4 (b) P460

Page 12: Memory Hierarchy

12

Memory modules

: supercell (i,j)

031 78151623243263 394047485556

64-bit doubleword at main memory address A

addr (row = i, col = j)

data

64 MB memory moduleconsisting ofeight 8Mx8 DRAMs

Memorycontroller

bits0-7

DRAM 7

DRAM 0

bits8-15

bits16-23

bits24-31

bits32-39

bits40-47

bits48-55

bits56-63

64-bit doubleword to CPU chipFigure 6.5 P461

Page 13: Memory Hierarchy

13

Enhanced DRAMs

• All enhanced DRAMs are built around the

conventional DRAM core

• Fast page mode DRAM (FPM DRAM)

– Access contents of row with [RAS, CAS, CAS,

CAS, CAS] instead of [(RAS,CAS), (RAS,CAS),

(RAS,CAS), (RAS,CAS)].

Page 14: Memory Hierarchy

14

Enhanced DRAMs

• Extended data out DRAM (EDO DRAM)

– Enhanced FPM DRAM with more closely spaced

CAS signals.

• Synchronous DRAM (SDRAM)

– Driven with rising clock edge instead of

asynchronous control signals

Page 15: Memory Hierarchy

15

Enhanced DRAMs

• Double data-rate synchronous DRAM (DDR SDRAM)– Enhancement of SDRAM that uses both clock

edges as control signals.

• Video RAM (VRAM)– Like FPM DRAM, but output is produced by

shifting row buffer

– Dual ported (allows concurrent reads and writes)

Page 16: Memory Hierarchy

16

Nonvolatile memories

• DRAM and SRAM are volatile memories– Lose information if powered off.

• Nonvolatile memories retain value even if powered off– Generic name is read-only memory (ROM).

– Misleading because some ROMs can be read and modified.

Nonvolatile: 非易失的

Page 17: Memory Hierarchy

17

Nonvolatile memories

• Types of ROMs– Programmable ROM (PROM)– Erasable programmable ROM (EPROM)– Electrically erasable PROM (EEPROM)– Flash memory

• Firmware– Program stored in a ROM

• Boot time code, BIOS (basic input/output system)• graphics cards, disk controllers

Page 18: Memory Hierarchy

18

Bus Structure Connecting CPU and memory

• A bus is a collection of parallel wires that

carry address, data, and control signals

• Buses are typically shared by multiple

devices

Page 19: Memory Hierarchy

19

Bus Structure Connecting CPU and memory P464

mainmemory

I/O bridge

bus interface

ALU

register file

CPU chip

system bus memory bus

1)

2)3)

4) 5)

Page 20: Memory Hierarchy

20

Memory read transaction (1) Figure 6.7 P465

• CPU places address A on the memory bus

ALU

register file

bus interface

A0

Ax

main memory

I/O bridge

%eax

Load operation: movl A, %eax

Page 21: Memory Hierarchy

21

Memory read transaction (2) Figure 6.7 P465

• Main memory reads A from the memory bus, retrieves word x, and places it on the bus.

ALU

register file

bus interface

x 0

Ax

main memory

%eax

I/O bridge

Load operation: movl A, %eax

Page 22: Memory Hierarchy

22

Memory read transaction (3) Figure 6.7 P465

• CPU read word x from the bus and copies it into register %eax.

xALU

register file

bus interfacex

main memory

0

A

%eax

I/O bridge

Load operation: movl A, %eax

Page 23: Memory Hierarchy

23

Memory write transaction (1)

• CPU places address A on bus

• Main memory reads it and waits for the corresponding data word to arrive.

Page 24: Memory Hierarchy

24

Memory write transaction (1) Figure 6.8 P466

y

ALU

register file

bus interface

A

main memory

0

A

%eax

I/O bridge

Store operation: movl %eax, A

Page 25: Memory Hierarchy

25

Memory write transaction (2) Figure 6.8 P466

• CPU places data word y on the bus.

yALU

register file

bus interface

y

main memory

0

A

%eax

I/O bridge

Store operation: movl %eax, A

Page 26: Memory Hierarchy

26

Memory write transaction (3) Figure 6.8 P466

• Main memory read data word y from the bus and stores it at address A

yALU

register file

bus interface y

main memory

0

A

%eax

I/O bridge

Store operation: movl %eax, A

Page 27: Memory Hierarchy

27

6.1.2 Disk Storage

Page 28: Memory Hierarchy

28

Disk geometry

• Disks consist of platters, each with two surfaces.

• Each surface consists of concentric rings called tracks.

• Each track consists of sectors separated by gaps.

Track: 磁道Sector: 扇区

Page 29: Memory Hierarchy

29

Disk geometry Figure 6.9 (a) P467

spindle

surfacetracks

track k

sectors

gaps

Page 30: Memory Hierarchy

30

Disk geometry (muliple-platter view)

• Aligned tracks form a cylinder.

surface 0

surface 1surface 2

surface 3

surface 4

surface 5

cylinder k

spindle

platter 0

platter 1

platter 2

Figure 6.9 (b) P467

Page 31: Memory Hierarchy

31

Disk capacity

• Capacity– maximum number of bits that can be stored

– Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^9.

Page 32: Memory Hierarchy

32

Disk capacity

• Capacity is determined by these technology factors:– Recording density (bits/in): number of bits that

can be squeezed into a 1 inch segment of a track.

– Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment.

– Areal density (bits/in2): product of recording and track density.

Page 33: Memory Hierarchy

33

Disk capacity

• Modern disks partition tracks into disjoint subsets called recording zones– Each track in a zone has the same number of

sectors, determined by the circumference of innermost track

– Each zone has a different number of sectors/track

Sector: 扇区Circumference: 圆周Innermost: 最里面的

Page 34: Memory Hierarchy

34

Computing disk capacity

• Capacity = (# bytes/sector) x

(avg. # sectors/track) x

(# tracks/surface) x

(# surfaces/platter) x

(# platters/disk)

Page 35: Memory Hierarchy

35

Computing disk capacity

• Example:– 512 bytes/sector– 300 sectors/track (on average)– 20,000 tracks/surface– 2 surfaces/platter– 5 platters/disk– Capacity = 512 x 300 x 20000 x 2 x 5

= 30,720,000,000 = 30.72 GB

Page 36: Memory Hierarchy

36

Disk operation (single-platter view)

By moving radially, the arm can position the read/write head over any track.

spindle

The disk surface spins at a fixedrotational rate

The read/write headis attached to the endof the arm and flies over the disk surface ona thin cushion of air.

Figure 6.10 (a) P469 Radially: 放射状地

Page 37: Memory Hierarchy

37

Disk operation (multi-platter view)

arm

read/write heads move in unison

from cylinder to cylinder

spindle

Figure 6.10 (b) P469 Spindle: 轴

Page 38: Memory Hierarchy

38

Disk access time

• Average time to access some target sector approximated by

– Taccess = Tavg seek + Tavg rotation + Tavg transfer

• Seek time– Time to position heads over cylinder containing

target sector.

– Typical Tavg seek = 9 ms

Page 39: Memory Hierarchy

39

Disk access time

• Rotational latency– Time waiting for first bit of target sector to pass

under r/w head.

– Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min

• Transfer time– Time to read the bits in the target sector.

– Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60

secs/1 min.

Page 40: Memory Hierarchy

40

Disk access time example

• Given:– Rotational rate = 7,200 RPM– Average seek time = 9 ms.– Avg # sectors/track = 400.

• Derived:– Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000

ms/sec = 4 ms.– Tavg transfer = 60/7200 RPM x 1/400 secs/track x

1000 ms/sec = 0.02 ms– Taccess = 9 ms + 4 ms + 0.02 ms

Page 41: Memory Hierarchy

41

Disk access time example

• Important points:– Access time dominated by seek time and

rotational latency

– First bit in a sector is the most expensive, the rest are free

– SRAM access time is about 4ns/doubleword

– DRAM about 60 ns

– Disk is about 40,000 times slower than SRAM

– Disk is about 2,500 times slower then DRAM

Page 42: Memory Hierarchy

42

Logical disk blocks

• Modern disks present a simpler abstract view of the complex sector geometry:– The set of available sectors is modeled as a

sequence of b-sized logical blocks (0, 1, 2, ...)

• Mapping between logical blocks and actual (physical) sectors– Maintained by hardware/firmware device called

disk controller– Converts requests for logical blocks into

(surface, track, sector) triples.

Page 43: Memory Hierarchy

43

Logical disk blocks

• Allows controller to set aside spare cylinders for each zone– Accounts for the difference in “formatted

capacity” and “maximum capacity”

Page 44: Memory Hierarchy

44

Bus structure connecting I/O and CPU

mainmemory

I/O bridge

bus interface

ALU

register file

CPU chip

system bus memory bus

disk controller

graphicsadapter

USBcontroller

mousekeyboard monitordisk

I/O bus Expansion slots forother devices suchas network adapters.

Figure 6.11 P472

Page 45: Memory Hierarchy

45

Reading a disk sector (1)

mainmemory

ALU

register file

CPU chip

disk controller

graphicsadapter

USBcontroller

mousekeyboard monitor

disk

I/O bus

bus interface

CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller.

Figure 6.12 (a) P473

Page 46: Memory Hierarchy

46

Reading a disk sector (2)

mainmemory

ALU

register file

CPU chip

disk controller

graphicsadapter

USBcontroller

mousekeyboard monitor

disk

I/O bus

bus interface

Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory.

Figure 6.12 (b) P473

Page 47: Memory Hierarchy

47

Reading a disk sector (3)

mainmemory

ALU

register file

CPU chip

disk controller

graphicsadapter

USBcontroller

mousekeyboard monitor

disk

I/O bus

bus interface

When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU)

Figure 6.12 (c) P474

Page 48: Memory Hierarchy

48

6.1.3 Storage Technology Trends

Page 49: Memory Hierarchy

49

6.2 Locality

Page 50: Memory Hierarchy

50

Locality

• Data locality int sumvec(int v[N]) {

int i, sum = 0 ;

for (i = 0 ; i < N ; i++)

sum += v[i] ;

return sum ;

}

Figure 6.17 (a) P479

Page 51: Memory Hierarchy

51

Locality

• Data locality

AddressContents

0v0

4v1

8v2

12v3

16v4

20v5

24v6

28v7

Access order

1 2 3 4 5 6 7 8

Figure 6.17 (b) P479

Page 52: Memory Hierarchy

52

Locality

• Principle of locality

– Programs tend to reference data items

• that are near other recently referenced data items

• that were recently referenced themselves

Page 53: Memory Hierarchy

53

Locality

• Two forms of locality

– Temporal locality

• A memory location that is referenced once is likely to

be referenced again multiple times in the near future

– Spatial locality

• If a memory location that is referenced once, the

program is likely to reference a nearby memory

location in the near future

Page 54: Memory Hierarchy

54

Locality

• All levels of modern computer systems are designed to exploit locality– Hardware

• Cache memory (to speed up main memory accesses)

– Operating systems• Use main memory to speed up virtual address space

accesses • Use main memory to speed up disk file accesses

– Application programs• Web browsers exploit temporal locality by caching

recently referenced documents on a local disk

Page 55: Memory Hierarchy

55

Locality

• Locality in the example– sum: temporal locality

– v: spatial locality• Stride-1 reference pattern

• Stride-k reference pattern– Visiting every k-th element of a contiguous

vector

– As the stride increases, the spatial locality decreases

Page 56: Memory Hierarchy

56

Locality

• Example (pp. 480, M=2, N=3) int sumvec(int v[M][N]) {int i, j, sum = 0 ;

for (i = 0 ; i < M ; i++)for ( j = 0 ; j < N ; j++ )

sum += v[i][j] ;return sum ;

}

Figure 6.18 (a) P480

Page 57: Memory Hierarchy

57

Locality

• Example (pp. 480, M=2, N=3)

AddressContents

0v00

4v01

8v02

12v10

16v11

20v12

Access order 1 2 3 4 5 6

Figure 6.18 (b) P480

Page 58: Memory Hierarchy

58

Locality

• Example (pp. 480, M=2, N=3) int sumvec(int v[M][N]) {int i, j, sum = 0 ;

for (j = 0 ; j < N ; j++)for ( i = 0 ; i < M ; i++ )

sum += v[i][j] ;return sum ;

}

Figure 6.19 (a) P480

Page 59: Memory Hierarchy

59

Locality

• Example (pp. 480, M=2, N=3)

AddressContents

0v00

4v01

8v02

12v10

16v11

20v12

Access order 1 3 5 2 4 6

Figure 6.19 (b) P480

Page 60: Memory Hierarchy

60

Locality

• Locality of the instruction fetch– Spatial locality

• In most cases, programs are executed in sequential order

– Temporal locality• Instructions in loops may be executed many times

Page 61: Memory Hierarchy

61

6.3 Memory Hierarchy

Page 62: Memory Hierarchy

62

Memory Hierarchy

• Fundamental properties of storage technology and computer software– Different storage technologies have widely

different access times

– Faster technologies cost more per byte than slower ones and have less capacity

– The gap between CPU and main memory speed is widening

– Well-written programs tend to exhibit good locality

Page 63: Memory Hierarchy

63

Main memory holds disk blocks retrieved from local disks.

registers

on-chip L1cache (SRAM)

main memory(DRAM)

local secondary storage(local disks)

Larger, slower,

and cheaper (per byte)storagedevices

remote secondary storage(distributed file systems, Web servers)

Local disks hold files retrieved from disks on remote network servers.

off-chip L2cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache.

CPU registers hold words retrieved from cache memory.

L2 cache holds cache lines retrieved from memory.

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,faster,and

costlier(per byte)storage devices

An example memory hierarchy Figure 6.21 P483

1)

2)

3)

4)

5)

6)

Page 64: Memory Hierarchy

64

Caching in Memory Hierarchy

• Cache– A small, fast device that acts as a staging area

for the data objects stored in a large, slower device

• Caching– Process of using a cache

Page 65: Memory Hierarchy

65

Caching in Memory Hierarchy

• In a memory hierarchy– For each k,

– the faster and smaller storage device at level k

– serves as a cache for

– the larger and slower storage device at level k+1

Page 66: Memory Hierarchy

66

Caching in a Memory Hierarchy Figure 6.22 P484

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.

Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1

Data is copied betweenlevels in block-sized transfer units

Level k:

Level k+1:

Page 67: Memory Hierarchy

67

Memory Hierarchy

• Blocks– At level k+1

• The storage is partitioned into contiguous chunks of data objects

• Each block has a unique address or name• Blocks can be fixed-size or variable-sized

– At level k• The storage is partitioned into a smaller set of blocks• The blocks are the same size as the blocks at level k+1• The storage contains copies of a subset of the blocks at

level k+1

Page 68: Memory Hierarchy

68

Memory Hierarchy

• Transfer units– Used to copy data back and forth between level

k and level k+1

back and forth: 来回地

Page 69: Memory Hierarchy

69

Memory Locality

• For any pair of adjacent levels, the block

size is fixed

• For other pairs of adjacent levels, the block

can have different size

Page 70: Memory Hierarchy

70

General caching conceptsFigure 6.22 P484

• Program needs object d, which is stored in some block b

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

Page 71: Memory Hierarchy

71

General caching concepts P485

• Cache hit– Program finds b

in the cache at level k. E.g. block 14.

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

Page 72: Memory Hierarchy

72

General caching concepts P485

• Cache miss– b is not at level k,

so level k cache must fetch it from level k+1. E.g. block 12.

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

Page 73: Memory Hierarchy

73

General caching concepts P485

• Cache Replacement– If level k cache is full,

then some current block must be replaced (evicted).

– Which one? Determined by replacement policy. E.g. evict least recently used block.

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

Page 74: Memory Hierarchy

74

Types of Cache Misses

• 1 ) Cold (compulsory) miss– Cold misses occur because the cache is empty.

Page 75: Memory Hierarchy

75

Types of Cache Misses

• 2 ) Conflict miss– Most caches limit blocks at level k+1 to a small

subset (sometimes a singleton) of the block positions at level k.

– E.g. Block i at level k+1 must be placed in block (i mod 4) at level k+1.

– Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block.

– E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.

Page 76: Memory Hierarchy

76

Types of Cache Misses

• 3 ) Capacity miss– Occurs when the set of active cache blocks

(working set) is larger than the cache.

Page 77: Memory Hierarchy

77

Cache Management

• At each level, some form of logic must manage the cache– Partition the cache storage into blocks– Transfer blocks between different levels– Decide when there are hits and misses– Deal with cache hits and misses– It can be hardware, software, or a combination

of the two• Compiler manages the register file• Hardware logic manages the L1 and L2 cache• Operating system and address translation hardware

manage the main memory

Page 78: Memory Hierarchy

78

Cache Management

• Caches– Operate automatically

– Do not require any specific (explicit) actions from the program

Page 79: Memory Hierarchy

79

Examples of caching in the hierarchyFigure 6.23 P487

Type What cached Where cached Latency(cycles) Managed by

Cpu registers 4-byte word Registers 0 compiler

TLB Address translation On-chip TLB 0 Hardware MMU

L1 cache 32-byte block On-chip L1 cache 1 Hardware

L2 cache 32-byte block Off-chip L2 cache 10 Hardware

Virtual memory 4-KB page Main memory 100 Hardware+OS

Buffer cache Parts of files Main memory 100 OS

Network buffer cache Parts of files Local disk 10,000,000 AFS/NSF client

Browser Web pages Local disk 10,000,000 Web browser

Web cache Web pages Remote disk 1,000,000,000 Web proxy server

Acronyms: TLB: Translation Lookaside Buffer, MMU: Memory Management Unit, OS: Operating System, AFS: Andrew File System, NFS: Network File System