204521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring...

50
4521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring 97

Transcript of 204521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring...

204521 Digital System Architecture 1

Lecture 7 Memory-System Architecture

Pradondet Nilagupta

Spring 97

204521 Digital System Architecture 2

Memory-System Architecture

• Early computers used vacuum tubes, switching elements (e.g. relays), mercury delay lines etc…

• In the early 1960’s, magnetic core memories were introduced

• Since then, memory has been solid state (integrated circuits) although there are many forms which we will look at

204521 Digital System Architecture 3

Memory capacity needs

• Consider that memory costs continue to drop and that memory capacities increase every year

• Book suggests that an ultimate upper limit for memory need might be 1PB (petrabyte, or 250 bytes)

• Current RAM chip sizes are approximately 16 Megabits per chip

204521 Digital System Architecture 4

Main Memory

• Mostly made up of RAM - Random Access Memory

• Given an address, the RAM chip retrieves the value stored at that memory location and returns it or stores the value sent to it

• Address sent on address bus, value sent over data bus

204521 Digital System Architecture 5

Typical RAM chip

Address Data Control

RAM

Address Bus

DataBus

ControlBus

204521 Digital System Architecture 6

Memory Organization

• Typically several memory chips are required to make up the physical memory space of a computer

• Memory can be organized using high-order or low-order interleaving

• high-order: highest bits of an address select the memory chip, lowest bits select the address so that two contiguous bytes are in the same chip

• low-order: consecutive bytes would be spread across two or more chips. Low order bits select chip, high order bits select location

204521 Digital System Architecture 7

Example

• Consider a system with CPU, 3 channels and 4 memory banks labelled 0-3.

• Using high-order interleave, all four banks can be used at once by 4 devices if each device accesses a byte in a different bank

• Using low-order interleave, since a byte is distributed across all 4 banks, simultaneous access is not possible

204521 Digital System Architecture 8

A CPU of Three Channel

CPU Channel 1 Channel 2 Channel 3

Bank 0 Bank 1 Bank 2 Bank 3

Memory-port controller

204521 Digital System Architecture 9

Four-way high-order interleave Memory

CPU Channel 1 Channel 2 Channel 3

Bank 0 Bank 1 Bank 2 Bank 3

Memory-port controller

204521 Digital System Architecture 10

Four-way low-order interleave Memory

CPU Channel 1 Channel 2 Channel 3

Bank 0 Bank 1 Bank 2 Bank 3

Memory-port controller

204521 Digital System Architecture 11

Why use low-order?

• In the prior example, it shows that high-order interleave allows simultaneous access of memory between devices

• However, what if the CPU wanted to fetch 4 consecutive bytes from memory at once?

• Using high-order interleave would require 4 requests and 4 transfers

• using low-order, 4 requests could be filled in an overlapped way that speeds up transfer time

• Many high performance computers use low-order

204521 Digital System Architecture 12

Types of ROM

• ROM - read only memory (also a form of random-access memory). Usually factory programmed and unalterable

• PROM - programmable ROM - fuses or burns information into the chip, writable once

• EPROM - erasable PROMs - using ultraviolet light, the entire PROM can be erased at once

• EAROM - electrically alterable ROM - erasable maybe 1000 times before they are no good

204521 Digital System Architecture 13

Types of RAM

• SRAM (static RAM) - information stored does not need refreshing. Often built out of flip-flops as shown in appendix B. Used predominantly for registers and fast cache. Often very expensive.

• DRAM (dynamic RAM) - requires refreshing often because the stored value tends to slowly discharge (approx. in 4 ms). DRAMs are often built with capacitors rather than flip-flops. Much cheaper, slower by a factor of 2 (approx)

• Volatile - requires constant power (SRAM, DRAM). Non-volatile includes mag. core and ROM

204521 Digital System Architecture 14

Destructive/Non-destructive

• DRAMs are destructive in that, reading a memory value will discharge it. Therefore, the DRAM not only requires refreshing often, but every time it is read, the value read must be copied back into the cell.

• SRAM and ROM are non-destructive.

204521 Digital System Architecture 15

Other Types of memory

• Charge-coupled devices - arrays of cells that hold packets of electrons that shift to neighboring cells with each clock tick

• Magnetic bubble memories - charges placed on magnetic crystalline material to retain a charge -- non-volatile

• Secondary storage can be thought of as a slow and possibly removable memory

204521 Digital System Architecture 16

Characteristics of common storage devices, based on sample technologies

Memory type Destructive Read Data Life Write Time Read Time Number of Volatility

Write cycles (Power

Allowed Required)

ROM No Decades Once 100 ns 1 No

PROM No Years Hours 100 ns Many No

EPROM No Years ms 100 ns 1000s No

SRAM No While power is on 10 ns. 10 ns Infinite Full

DRAM Yes 4 ms 100 ns. 200 ns Infinite 10%

Magnetic core Yes Decades 1-2 ms. 0.5-1 ms. Infinite No

204521 Digital System Architecture 17

Non RAM memories

• Associative Memory - use the content of the memory to locate it.

– Example - search for all words with higher-order byte=11011110

– Each memory location has a match bit which is set if the location matches the content being sought

• Sequential Access - most often found in tape storage

• Archival memory - nonvolatile, usually secondary storage

204521 Digital System Architecture 18

Main-Memory System

• While programs are viewed as having one large continuous logical address space from 0 to max, memory might not be set out the same way due to the limited size of memory and need to store other things in memory.

• Memory management hardware must be able to locate an effective address whether it is stored in memory or elsewhere or has been relocated to a new place in memory

204521 Digital System Architecture 19

Relocation

• Program relocation occurs when the OS moves the program from one location to another

• The relocation offset is the address where the program has been moved to (assuming that it was originally at location 0)

• Relocation requires memory protection so that the program does not access memory outside of its bounds

204521 Digital System Architecture 20

Relocation offset

A program

A program

0

N

0

R0

Ro + N

Logical address Physical address

Main Memory

204521 Digital System Architecture 21

Example: IBM 360

• Base registers used for program relocation

• Loader initially places program in memory and loads base registers with key data addresses

• Instructions use base-displacement addr.

• Because programs use base registers, an executing program cannot be relocated!

• Key-controlled memory protection - each program has an access key stored in hardware in the PSW. The memory allocated to the program has an associated key. If a memory location is accessed that does not match the PSW key, it is disallowed.

204521 Digital System Architecture 22

CDC 6600

• Uses a relocation-address register to store relocation offset. Therefore, the OS can move a program after it has started executing.

• Memory allocated has an upper limit called the Field Length. A Field-Length register is also provided.

• Any access to a location not between offset and offset+field-length is disallowed.

204521 Digital System Architecture 23

CDC6600

RA

MAX

Main Memory

The hardware raises a memory protectionexception if the physical address exceedsthe field length

Field-length register

EA > FL?

+

Relocation address register

CPU’s effective address

0

FL

204521 Digital System Architecture 24

Memory Security Requirements

• Must be able to distinguish between OS and user software/instructions

• Must provide facilities so that procedures can share code and data but also prevent one procedure from accessing another’s private code/data

• Must provide protected, secure and efficient I/O and distinguish between user and system traps

• For pipelined operations, must respond efficiently if illegal conditions arise

204521 Digital System Architecture 25

Cache

• Principle of Locality - most recently refered to instructions and data will most likely be used again sometime soon

• Idea behind a cache is to speed up processing time by eliminating slow-down for main memory access

• Provide high-speed memory (cache) which stores most recently used instr / data

204521 Digital System Architecture 26

Using a Cache

• Cache consists of a number of cache entries

• Each entry contains memory and an address tag which denotes its main memory effective addr

• When reading/writing an item to memory, first look in the cache

• Cache hit - cache currently holds requested item

• Cache miss - item not in cache, must be retrieved from memory. Usually the item will then be loaded into cache

204521 Digital System Architecture 27

Cache Writes

• What happens if the item is being written too? If cache item is updated and memory item is not, this leaves a “dirty cell”, I.e. an obsolete value in memory.

• Write-through cache - copy the new datum both to cache and memory

• Write-back cache - hold datum until a later time and then write it back into memory

• Write-around cache - if item is not in cache, simply write value to memory (and possibly load it into the cache)

204521 Digital System Architecture 28

Cache Structure & Organizations

• Tag and Memory: Tag hold the address, memory holds the data and is organized into refill lines equal to a length in a power of 2

• Associative Caches - tags stored in assoc. mem.

• Direct-mapped cache - partitions memory into k columns and n refill lines and stores exactly 1 col. from each refill line in the cache

• Set-associative Cache - combines both methods using m columns out of n for each refill line

204521 Digital System Architecture 29

Fully associative cache

0

1

2

3

8

13

14

15

Cache RAM TAG memory

One refill line

8

13

2

0

Main Memory

One refill line

204521 Digital System Architecture 30

Direct Mapped cache

Tag memory

Cache memory

Main Memory0 1 2 3

0

2

1

1

0

3

1

3

Column numberOne refill LineOne refill Line

204521 Digital System Architecture 31

A two-way set-associative

Tag memory

Cache memory

Main Memory0 1 2 3

Column numberOne refill LineOne refill Line

204521 Digital System Architecture 32

Cache Address Processing

• The effective address is computed by the CPU which then attempts to fetch the item from memory

• If associative cache then address is simultaneously compared to all cache tags for a match

• If direct mapped cache then upper portion of address is used to index into appropriate cache location and entire address is compared against tag at that storage location

• Set-associative combines the two techniques

204521 Digital System Architecture 33

Cache Performance

• Time to execute an instruction =TPIbase + TIMEmemory access

• Where TPIbase is the time it takes the CPU to execute the instruction

• and TIMEmemory access is the time it takes to fetch the instruction

• TIMEmemory access = Hit rate * TIMEcache access + Miss rate * TIMEmemory access

• Hit rate = (1 - Miss rate)

204521 Digital System Architecture 34

Conclusion on Cache

• Cache is used to reduce the time taken for memory accesses

• Typical hit rates are 90% or higher

• Different types of processes call for different cache sizes and organizations

• In RISC machines using a Harvard architecture, there might be two caches, an instruction cache and a data cache

• Caches can be used for disk access as well

204521 Digital System Architecture 35

Virtual Memory

• What if the program you want to run is larger than the size of main memory?

• What if the physical address space is smaller than the logical address space?

• Virtual Memory is an approach to organizing programs so that only a portion of the program is in memory at any time

• Two approaches: Paging and Segmentation

204521 Digital System Architecture 36

Paging

• All of memory is broken into small blocks (usually between 512 KB and 4 MB)

• All programs and data are broken into blocks and loaded when needed (on demand)

• A memory map is used to determine if a needed page is in memory and if so where

• If a page is not in memory, it needs to be loaded by the OS

204521 Digital System Architecture 37

Page Table continued

Virtual-page number Byte offset Page-table base addressA page table in memory

Page-framenumber

ProtectionDV

Virtual-pagenumber V D Protection

Page-framenumber

Control logic

Page-frame number Byte offset Operand

Effective address Page-table base register Main Memory

204521 Digital System Architecture 38

Page replacement

• When a page is not in memory, a page fault occurs (the OS triggers a trap which causes the page to be found on disk and loaded)

• Where does the new page get loaded?

• What if memory is full? If this is true, something must be swapped out.

• Several page-replacement policies:

– First in, First out; least recently used

204521 Digital System Architecture 39

Thrashing

• A situation where nearly every memory access causes a page fault

• Consider a very large 2-D array which is stored in row-major order where the program instead accesses the first item in each row (I.e. all of column 1). If the array is spread across multiple pages, then each access might require swapping

204521 Digital System Architecture 40

Segmentation

• Paging may split data or routines up because of the fixed-size nature of a page

• Instead, programs and data are often naturally divided into segments, such as an array being one segment, a procedure being a segment, etc…

• Segmentation is virtual memory where the blocks loaded are not fixed-size, but instead program segements

204521 Digital System Architecture 41

Segmentation Map

•Similar to a Page Map (table), here each separable unit of the program is stored in its own segment. •The Segmentation Map lists each segment by a segment number and stores a segment base address (that is the segment’s location in memory)•Segmentation Map also has Valid and Dirty bit and protection. Can store the map in a TLB in cache

204521 Digital System Architecture 42

Segmentaion Map continued

• The approach differs from paging because of the non-fixed sizes to segments

• Rather than concatenating the segment address and a byte offset as done in paging, the two are added together

• Also, most systems that use segmentation keep separate tables for system segments and user segments thus providing two segmentation maps

204521 Digital System Architecture 43

Splinters

• One problem with segmentation is that, since segments are of differing sizes, to load a new segment, the OS must find enough space to load it

• If the OS removes a slightly larger segment, then some space is not going to be used by the new segment, this is called a splinter

• Memory can splinter leaving many small gaps which is, of course, not an efficient use of memory

204521 Digital System Architecture 44

Segmentation and Paging

• Segmenation reflects the logical structure of a program and is therefore a better means of dividing up memory

• Paging does not have the disadvantage of splintering

• Combining the two methods provides a technique that maintains the best of both

204521 Digital System Architecture 45

Cache vs. Virtual Memory

• Both approaches use hardware to map the effective address into its true location

• Both hold data for the CPU, replacing older information with newer information

• They differ greatly in purpose and what happens when information is not found (I.e. cache miss, page or segment fault)

• Caches are used to speed up memory fetching while virtual memory is used to enlarge the phystical memory space of a computer

204521 Digital System Architecture 46

Mem. Banking/Expanded Mem

• A computer’s physical address space is restricted to the size of the range of physical addresses -- e.g. 16 bit addresses mean 216 possible addresses (64 Kbyte)

• However, one can have more memory and switch off between different sets of memory by memory banking or expanded memory.

204521 Digital System Architecture 47

Memory Banking

CPU RAM 0 RAM 1 RAM 2 RAM 3

Address bus

Data bus

204521 Digital System Architecture 48

Memory Banking

CPU RAM 0 RAM 1 RAM 2

Address bus

Data bus

Bank 0

Bank 1

Bank 2

Bank 3

Bank select register

Banking Hardware

204521 Digital System Architecture 49

• Memory Speed vs. CPU speed - CPU speed is increasing greater than memory speed is increasing leaving a bigger discrepency

• Memory Address Space - the size of the address bus determines the limitation to the logical address space. Future architectures should anticipate larger address spaces (up to 64 bit address or 26

4bytes)

• Speed-cost tradeoffs - there are faster, high costing memories available

Memory Design Issues

204521 Digital System Architecture 50

Speed-cost Tradeoffs

There are two keys charateristics of memory technology

• Its unit price is decreasing very rapidly, while its speed is slowly increasing.

• There is a great variety of speeds and costs in memory devices.

– Slow , inexpensive devices for main memory

– fast device for cache

– very fast, expensive devices for registers and control store.