204521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring...

204521 Digital System Architecture 1

Lecture 7 Memory-System Architecture

Pradondet Nilagupta

Spring 97


Memory-System Architecture

• Early computers used vacuum tubes, switching elements (e.g. relays), mercury delay lines etc…

• In the early 1960’s, magnetic core memories were introduced

• Since then, memory has been solid state (integrated circuits) although there are many forms which we will look at


Memory capacity needs

• Consider that memory costs continue to drop and that memory capacities increase every year

• Book suggests that an ultimate upper limit for memory need might be 1PB (petrabyte, or 250 bytes)

• Current RAM chip sizes are approximately 16 Megabits per chip


Main Memory

• Mostly made up of RAM - Random Access Memory

• Given an address, the RAM chip retrieves the value stored at that memory location and returns it or stores the value sent to it

• Address sent on address bus, value sent over data bus


Typical RAM chip

Address Data Control

RAM

Address Bus

DataBus

ControlBus


Memory Organization

• Typically several memory chips are required to make up the physical memory space of a computer

• Memory can be organized using high-order or low-order interleaving

• high-order: highest bits of an address select the memory chip, lowest bits select the address so that two contiguous bytes are in the same chip

• low-order: consecutive bytes would be spread across two or more chips. Low order bits select chip, high order bits select location


Example

• Consider a system with CPU, 3 channels and 4 memory banks labelled 0-3.

• Using high-order interleave, all four banks can be used at once by 4 devices if each device accesses a byte in a different bank

• Using low-order interleave, since a byte is distributed across all 4 banks, simultaneous access is not possible


A CPU of Three Channel

CPU Channel 1 Channel 2 Channel 3

Bank 0 Bank 1 Bank 2 Bank 3

Memory-port controller


Four-way high-order interleave Memory





Four-way low-order interleave Memory





Why use low-order?

• In the prior example, it shows that high-order interleave allows simultaneous access of memory between devices

• However, what if the CPU wanted to fetch 4 consecutive bytes from memory at once?

• Using high-order interleave would require 4 requests and 4 transfers

• using low-order, 4 requests could be filled in an overlapped way that speeds up transfer time

• Many high performance computers use low-order


Types of ROM

• ROM - read only memory (also a form of random-access memory). Usually factory programmed and unalterable

• PROM - programmable ROM - fuses or burns information into the chip, writable once

• EPROM - erasable PROMs - using ultraviolet light, the entire PROM can be erased at once

• EAROM - electrically alterable ROM - erasable maybe 1000 times before they are no good


Types of RAM

• SRAM (static RAM) - information stored does not need refreshing. Often built out of flip-flops as shown in appendix B. Used predominantly for registers and fast cache. Often very expensive.

• DRAM (dynamic RAM) - requires refreshing often because the stored value tends to slowly discharge (approx. in 4 ms). DRAMs are often built with capacitors rather than flip-flops. Much cheaper, slower by a factor of 2 (approx)

• Volatile - requires constant power (SRAM, DRAM). Non-volatile includes mag. core and ROM


Destructive/Non-destructive

• DRAMs are destructive in that, reading a memory value will discharge it. Therefore, the DRAM not only requires refreshing often, but every time it is read, the value read must be copied back into the cell.

• SRAM and ROM are non-destructive.


Other Types of memory

• Charge-coupled devices - arrays of cells that hold packets of electrons that shift to neighboring cells with each clock tick

• Magnetic bubble memories - charges placed on magnetic crystalline material to retain a charge -- non-volatile

• Secondary storage can be thought of as a slow and possibly removable memory


Characteristics of common storage devices, based on sample technologies

Memory type Destructive Read Data Life Write Time Read Time Number of Volatility

Write cycles (Power

Allowed Required)

ROM No Decades Once 100 ns 1 No

PROM No Years Hours 100 ns Many No

EPROM No Years ms 100 ns 1000s No

SRAM No While power is on 10 ns. 10 ns Infinite Full

DRAM Yes 4 ms 100 ns. 200 ns Infinite 10%

Magnetic core Yes Decades 1-2 ms. 0.5-1 ms. Infinite No


Non RAM memories

• Associative Memory - use the content of the memory to locate it.

– Example - search for all words with higher-order byte=11011110

– Each memory location has a match bit which is set if the location matches the content being sought

• Sequential Access - most often found in tape storage

• Archival memory - nonvolatile, usually secondary storage


Main-Memory System

• While programs are viewed as having one large continuous logical address space from 0 to max, memory might not be set out the same way due to the limited size of memory and need to store other things in memory.

• Memory management hardware must be able to locate an effective address whether it is stored in memory or elsewhere or has been relocated to a new place in memory


Relocation

• Program relocation occurs when the OS moves the program from one location to another

• The relocation offset is the address where the program has been moved to (assuming that it was originally at location 0)

• Relocation requires memory protection so that the program does not access memory outside of its bounds


Relocation offset

A program

A program

0

N

0

R0

Ro + N

Logical address Physical address

Main Memory


Example: IBM 360

• Base registers used for program relocation

• Loader initially places program in memory and loads base registers with key data addresses

• Instructions use base-displacement addr.

• Because programs use base registers, an executing program cannot be relocated!

• Key-controlled memory protection - each program has an access key stored in hardware in the PSW. The memory allocated to the program has an associated key. If a memory location is accessed that does not match the PSW key, it is disallowed.


CDC 6600

• Uses a relocation-address register to store relocation offset. Therefore, the OS can move a program after it has started executing.

• Memory allocated has an upper limit called the Field Length. A Field-Length register is also provided.

• Any access to a location not between offset and offset+field-length is disallowed.


CDC6600

RA

MAX

Main Memory

The hardware raises a memory protectionexception if the physical address exceedsthe field length

Field-length register

EA > FL?

+

Relocation address register

CPU’s effective address

0

FL


Memory Security Requirements

• Must be able to distinguish between OS and user software/instructions

• Must provide facilities so that procedures can share code and data but also prevent one procedure from accessing another’s private code/data

• Must provide protected, secure and efficient I/O and distinguish between user and system traps

• For pipelined operations, must respond efficiently if illegal conditions arise


Cache

• Principle of Locality - most recently refered to instructions and data will most likely be used again sometime soon

• Idea behind a cache is to speed up processing time by eliminating slow-down for main memory access

• Provide high-speed memory (cache) which stores most recently used instr / data


Using a Cache

• Cache consists of a number of cache entries

• Each entry contains memory and an address tag which denotes its main memory effective addr

• When reading/writing an item to memory, first look in the cache

• Cache hit - cache currently holds requested item

• Cache miss - item not in cache, must be retrieved from memory. Usually the item will then be loaded into cache


Cache Writes

• What happens if the item is being written too? If cache item is updated and memory item is not, this leaves a “dirty cell”, I.e. an obsolete value in memory.

• Write-through cache - copy the new datum both to cache and memory

• Write-back cache - hold datum until a later time and then write it back into memory

• Write-around cache - if item is not in cache, simply write value to memory (and possibly load it into the cache)


Cache Structure & Organizations

• Tag and Memory: Tag hold the address, memory holds the data and is organized into refill lines equal to a length in a power of 2

• Associative Caches - tags stored in assoc. mem.

• Direct-mapped cache - partitions memory into k columns and n refill lines and stores exactly 1 col. from each refill line in the cache

• Set-associative Cache - combines both methods using m columns out of n for each refill line


Fully associative cache

0

1

2

3

8

13

14

15

Cache RAM TAG memory

One refill line

8

13

2

0

Main Memory

One refill line


Direct Mapped cache

Tag memory

Cache memory

Main Memory0 1 2 3

0

2

1

1

0

3

1

3

Column numberOne refill LineOne refill Line


A two-way set-associative

Tag memory

Cache memory

Main Memory0 1 2 3

Column numberOne refill LineOne refill Line


Cache Address Processing

• The effective address is computed by the CPU which then attempts to fetch the item from memory

• If associative cache then address is simultaneously compared to all cache tags for a match

• If direct mapped cache then upper portion of address is used to index into appropriate cache location and entire address is compared against tag at that storage location

• Set-associative combines the two techniques


Cache Performance

• Time to execute an instruction =TPIbase + TIMEmemory access

• Where TPIbase is the time it takes the CPU to execute the instruction

• and TIMEmemory access is the time it takes to fetch the instruction

• TIMEmemory access = Hit rate * TIMEcache access + Miss rate * TIMEmemory access

• Hit rate = (1 - Miss rate)


Conclusion on Cache

• Cache is used to reduce the time taken for memory accesses

• Typical hit rates are 90% or higher

• Different types of processes call for different cache sizes and organizations

• In RISC machines using a Harvard architecture, there might be two caches, an instruction cache and a data cache

• Caches can be used for disk access as well


Virtual Memory

• What if the program you want to run is larger than the size of main memory?

• What if the physical address space is smaller than the logical address space?

• Virtual Memory is an approach to organizing programs so that only a portion of the program is in memory at any time

• Two approaches: Paging and Segmentation


Paging

• All of memory is broken into small blocks (usually between 512 KB and 4 MB)

• All programs and data are broken into blocks and loaded when needed (on demand)

• A memory map is used to determine if a needed page is in memory and if so where

• If a page is not in memory, it needs to be loaded by the OS


Page Table continued

Virtual-page number Byte offset Page-table base addressA page table in memory

Page-framenumber

ProtectionDV

Virtual-pagenumber V D Protection

Page-framenumber

Control logic

Page-frame number Byte offset Operand

Effective address Page-table base register Main Memory


Page replacement

• When a page is not in memory, a page fault occurs (the OS triggers a trap which causes the page to be found on disk and loaded)

• Where does the new page get loaded?

• What if memory is full? If this is true, something must be swapped out.

• Several page-replacement policies:

– First in, First out; least recently used


Thrashing

• A situation where nearly every memory access causes a page fault

• Consider a very large 2-D array which is stored in row-major order where the program instead accesses the first item in each row (I.e. all of column 1). If the array is spread across multiple pages, then each access might require swapping


Segmentation

• Paging may split data or routines up because of the fixed-size nature of a page

• Instead, programs and data are often naturally divided into segments, such as an array being one segment, a procedure being a segment, etc…

• Segmentation is virtual memory where the blocks loaded are not fixed-size, but instead program segements


Segmentation Map

•Similar to a Page Map (table), here each separable unit of the program is stored in its own segment. •The Segmentation Map lists each segment by a segment number and stores a segment base address (that is the segment’s location in memory)•Segmentation Map also has Valid and Dirty bit and protection. Can store the map in a TLB in cache


Segmentaion Map continued

• The approach differs from paging because of the non-fixed sizes to segments

• Rather than concatenating the segment address and a byte offset as done in paging, the two are added together

• Also, most systems that use segmentation keep separate tables for system segments and user segments thus providing two segmentation maps


Splinters

• One problem with segmentation is that, since segments are of differing sizes, to load a new segment, the OS must find enough space to load it

• If the OS removes a slightly larger segment, then some space is not going to be used by the new segment, this is called a splinter

• Memory can splinter leaving many small gaps which is, of course, not an efficient use of memory


Segmentation and Paging

• Segmenation reflects the logical structure of a program and is therefore a better means of dividing up memory

• Paging does not have the disadvantage of splintering

• Combining the two methods provides a technique that maintains the best of both


Cache vs. Virtual Memory

• Both approaches use hardware to map the effective address into its true location

• Both hold data for the CPU, replacing older information with newer information

• They differ greatly in purpose and what happens when information is not found (I.e. cache miss, page or segment fault)

• Caches are used to speed up memory fetching while virtual memory is used to enlarge the phystical memory space of a computer


Mem. Banking/Expanded Mem

• A computer’s physical address space is restricted to the size of the range of physical addresses -- e.g. 16 bit addresses mean 216 possible addresses (64 Kbyte)

• However, one can have more memory and switch off between different sets of memory by memory banking or expanded memory.


Memory Banking

CPU RAM 0 RAM 1 RAM 2 RAM 3

Address bus

Data bus


Memory Banking

CPU RAM 0 RAM 1 RAM 2

Address bus

Data bus

Bank 0

Bank 1

Bank 2

Bank 3

Bank select register

Banking Hardware


• Memory Speed vs. CPU speed - CPU speed is increasing greater than memory speed is increasing leaving a bigger discrepency

• Memory Address Space - the size of the address bus determines the limitation to the logical address space. Future architectures should anticipate larger address spaces (up to 64 bit address or 26

4bytes)

• Speed-cost tradeoffs - there are faster, high costing memories available

Memory Design Issues


Speed-cost Tradeoffs

There are two keys charateristics of memory technology

• Its unit price is decreasing very rapidly, while its speed is slowly increasing.

• There is a great variety of speeds and costs in memory devices.

– Slow , inexpensive devices for main memory

– fast device for cache

– very fast, expensive devices for registers and control store.

204521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring...

Documents

Transcript of 204521 Digital System Architecture 1 Lecture 7 Memory-System Architecture Pradondet Nilagupta Spring...