Computer organization memory

The Memory System

Deepak John, Department Computer Applications,SJCET-Pala

Basic ConceptsThe maximum size of the memory that can be used in any computer isdetermined by the addressing scheme.

16-bit addresses = 216 = 64K memory locationsMost modern computers are byte addressable.

Up to 2k addressableMDR

MAR

Figure 5.1. Connection of the memory to the processor.

k-bitaddress bus

n-bitdata bus

Control lines( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

The Memory Hierarchy

Main Memory : memory unit that communicates directly withthe CPU (RAM)Auxiliary Memory : device that provide backup storage (DiskDrives)Cache Memory : special very-high-speed memory to increasethe processing speed (Cache RAM)

Mag net ictap es

Mag net icd isks

I/ O p ro c esso r

CPU

Mainm em o ry

Cac hem em o ry

Auxilia ry m em o ry

Basic ConceptsMemory access time :the average time taken to read a unit of

informationMemory cycle time: - the average time lapse between two successive

read operations

Semiconductor RAM Memories

Internal organization of memory chips

Each memory cell can hold one bit of information.Memory cells are organized in the form of an array.One row is one memory word.All cells of a row are connected to a common line, known as the“word line”.Word line is connected to the address decoder.Sense/write circuits are connected to the data input/output lines ofthe memory chip.

Internal Organization of Memory Chips

FF

Figure 5.2. Organization of bit cells in a memory chip.

circuitSense / Write

Addressdecoder

FF

CS

cellsMemory

circuitSense / Write Sense / Write

circuit

Data input/output lines:

A0

A1

A2

A3

W0

W1

W15

b7 b1 b0

WR /

b¢7 b¢1 b¢0

b7 b1 b0

•••

•••

•••

•••

•••

•••

•••

•••

•••

A Memory Chip

Figure 5.3. Organization of a 1K(1024) 1 memory chip

CS

Sense/ Writecircuitry

arraymemory cell

address5-bit row

input/outputData

5-bitdecoder

address5-bit column

address10-bit

output multiplexer32-to-1

input demultiplexer

32 32

WR/

W0

W1

W31

and

SRAM CellTwo transistor inverters are cross connected to implement a basic flip-flop.The cell is connected to one word line and two bits lines by transistors T1 and T2When word line is at ground level, the transistors are turned off and the latch retainsits state.Two statesState 1:if the logic value at point X is 1 and Y is 0State 0: if the logic value at point X is 0 and Y is 1

YX

Word line

Bit lines

b

T 2T 1

b ¢

Read operationI.the word line is activated to close switches T1 and T2.II.Sense/Write circuits at the bottom, monitor the state of b and b’and set the o/p accordingly.Write operationI.State of cell is set by placing the appropriate value on b and b’II.Activating word line

Asynchronous DRAMsStatic RAMs (SRAMs):

Consist of circuits that are capable of retaining their state as longas the power is applied.Volatile memories, because their contents are lost when power isinterrupted.Access times of static RAMs are in the range of fewnanoseconds.the cost is usually high.

Dynamic RAMs (DRAMs):Do not retain their state indefinitely.Contents must be periodically refreshed.Contents may be refreshed while accessing them for reading.

Asynchronous DRAMsEach row can store 512 bytes.12 bits to select a row, and 9 bitsto select a group in a row. Totalof 21 bits.•First apply the row address,RAS signal latches the rowaddress. Then apply the columnaddress, CAS signal latches theaddress.•Timing of the memory unit iscontrolled by a specialized unitwhich generates RAS and CAS.•This is asynchronous DRAM

Column

CSSense / Writecircuits

cell arraylatchaddressRow

Column

latch

decoderRow

decoderaddress

4096 512 8( )

R/W

A20 9- A8 0-

D0D7

RAS

CAS

16 megabit DRAM chip

Synchronous DRAMsThe operations of SDRAM are controlled by a clock signal.

R/ W

R AS

CAS

CS

Clock

Cell arraylatch

addressRow

decoderRow

Figure 5.8. Synchronous DRAM.

decoderColumn Read/Write

circuits & latchescounteraddressColumn

Row/Columnaddress

Data inputregister

Data outputregister

Data

Refreshcounter

Mode registerand

timing control

•Operation is directlysynchronized with processor clocksignal.•The outputs of the sense circuitsare connected to a latch.•During a Read operation, thecontents of the cells in a row areloaded onto the latches.•During a refresh operation, thecontents of the cells are refreshedwithout changing the contents ofthe latches.•Data held in the latchescorrespond to the selectedcolumns are transferred to theoutput.

Synchronous DRAMs

Refresh circuits are included (every 64ms).Clock frequency > 100 MHzIntel PC100 and PC133

Latency and Bandwidth• Memory latency – the amount of time it takes to transfer a word of

data to or from the memory.• Memory bandwidth – the number of bits or bytes that can be

transferred in one second. It is used to measure how much time isneeded to transfer an entire block of data.

• Bandwidth is the product of the rate at which data are transferred(and accessed) and the width of the data bus.

Double-Data-Rate SDRAMStandard SDRAM performs all actions on the rising edge of theclock signal.DDR SDRAM accesses the cell array in the same way, but transfersthe data on both edges of the clock.The cell array is organized in two banks. Each can be accessedseparately.

DDR-SDRAM

Static memories19-bit internal chip address

decoder2-bit

addresses21-bit

A0A1

A19

memory chip

A20

D31-24 D7-0D23-16 D15-8

512K 8

Chip select

memory chip

19-bitaddress

512K 8

8-bit datainput/output

•Implement a memory unit of 2Mwords of 32 bits each.•Use 512x8 static memory chips.•Each column consists of 4 chips.•Each chip implements one byteposition.•A chip is selected by setting itschip select control line to 1.•Selected chip places its data on hedata output line, outputs of otherchips are in high impedance state.•21 bits to address a 32-bit word.•High order 19 bits are needed toselect the row and lower order 2 bits forselect column by activating thefour Chip Select signals.•19 bits are used to access specificbyte locations inside the selectedchip.Organization of a 2M * 32 memory module using 512 K * 8 static chips

Memory System ConsiderationsThe choice of a RAM chip for a given application depends onseveral factors:

Cost, speed, power, size…SRAMs are faster, more expensive, smaller.DRAMs are slower, cheaper, larger.

Dynamic memories• Large dynamic memory systems can be implemented using DRAM

chips in a similar way to static memory systems.• Placing large memory systems directly on the motherboard will

occupy a large amount of space.Also, this arrangement is inflexible since the memory systemcannot be expanded easily.

• Packaging considerations have led to the development of largermemory units known as SIMMs (Single In-line Memory Modules)and DIMMs (Dual In-line Memory Modules).

• Memory modules are an assembly of memory chips on a smallboard that plugs vertically onto a single socket on the motherboard.

Occupy less space on the motherboard.Allows for easy expansion by replacement.

Memory Controller

Processor

RAS

CAS

R/ W

Clock

AddressRow/Column

address

Memorycontroller

R/ W

Clock

Request

CS

Data

Memory

Figure 5.11. Use of a memory controller.

Read-Only Memories

ROMROM is used for storing programs that are PERMENTLY resident inthe computer and for tables of constants that do not change in valueonce the production of the computer is completedThe ROM portion of main memory is needed for storing an initialprogram called bootstrap loader,

Not connected to store a 1Connected to store a 0

Figure 5.12. A ROM cell.

Word line

P

Bit line

T

Read-Only-MemoryROMPROM: programmable ROMEPROM: erasable, reprogrammable ROMEEPROM: can be programmed and erased electrically

Flash Memory• Difference: only possible to write an entire block of cells ,but read the

contents of a single cell• Low power consumption• Use in portable equipment

Flash cards Flash drives

Cache Memories

Cache

• Processor issues a Read request, a block of words is transferredfrom the main memory to the cache, one word at atime.Subsequent references to the data in this block of words arefound in the cache.

• At any given time, only some blocks in the main memory are heldin the cache. Which blocks in the main memory are in the cache isdetermined by a “mapping function.

• When the cache is full, and a block of words needs to betransferredfrom the main memory, some block of words in the cache must bereplaced. This is determined by a “replacement algorithm”.

Figure 5.14. Use of a cache memory.

CacheMain

memoryProcessor

Cache memory If the active portions of the program and data are placed in a fast smallmemory, the average memory access time can be reduced, Thus reducingthe total execution time of the programSuch a fast small memory is referred to as cache memoryThe cache is the fastest component in the memory hierarchy andapproaches the speed of CPU component.When CPU needs to access memory, the cache is examined

• Locality of reference- temporal- spatial

• Cache block – cache line A set of contiguous address locations of some size

Principle of LocalityPrinciple of locality (or locality of reference):Program accesses a relatively small portion of the address space atany instant of time.Temporal locality and spatial locality.

Temporal locality (locality in time):• Keep most recently accessed data items closer to the processor.

Spatial locality (locality in space):• Move blocks consisting of contiguous words to ‘upper’ levels.

Cache Hit: data appears in some block• Hit rate: the fraction of memory access found in the upper level.

Cache Miss: data is not found, and needs to be retrieved from a block.

CPU

Cache

MainMemory

Hit

Miss

The basic characteristic of cache memory is its fast access time,Therefore, very little or no time must be wasted when searching thewords in the cacheThe transformation of data from main memory to cache memory isreferred to as a mapping process, there are three types of mapping:

Associative mappingDirect mappingSet-associative mapping

• In general case, there are 2^k words in cache memory and 2^nwords in main memory .

• The n bit memory address is divided into two fields: k-bits for theindex and n-k bits for the tag field.

Direct mappingMain

memory Block 0

Block 1

Block 127

Block 128

Block 129

Block 255

Block 256

Block 257

Block 4095

7 4

Main memory address

Tag Block Word

5

tag

tag

tag

Cache

Block 0

Block 1

Block 127

•Block j of the main memory maps to j modulo 128of the cache. 0 maps to 0, 129 maps to 1.•More than one memory block is mapped onto thesame position in the cache.•May lead to contention for cache blocks even if thecache is not full.•Resolve the contention by allowing new block toreplace the old block, leading to a trivialreplacement algorithm.•Memory address is divided into three fields:

- Low order 4 bits determine one of the 16words in a block.

- When a new block is brought into the cache,the next 7 bits determine which cache block thisnew block is placed in.

- High order 5 bits determine which of thepossible 32 blocks is currently present in the cache.These are tag bits.•Simple to implement but not very flexible.

Tag: 11101Block: 1111111=127, in the 127th block of the cacheWord:1100=12, the 12th word of the 127th block in the cache

7 4 Main memory addressTag Block Word5

11101,1111111,1100

Associative mapping•Main memory block can be placedinto any cache position.•Memory address is divided into twofields:

- Low order 4 bits identify the wordwithin a block.

-high order 12 tag bits Identifywhich of the 4096 blocks that areresident in the cache 4096=212.•Flexible, and uses cache spaceefficiently.•Replacement algorithms can be usedto replace an existing block in thecache when the cache is full.

Mainmemory Block 0

Block 1

Block 127

Block 128

Block 129

Block 255

Block 256

Block 257

Block 4095

4

Main memory address

Tag Word12

tag

tag

tag

Cache

Block 0

Block 1

Block 127

Tag: 111011111111Word:1100=12, the 12th word of a block in the cache

111011111111,1100

412 Main memory addressTag Word

Set-Associative Mapping

•Blocks of cache are grouped into sets.•Mapping function allows a block of themain memory to reside in any block of aspecific set.•Divide the cache into 64 sets, with twoblocks per set.•Memory address is divided into threefields:

- 6 bit Set field determines the setnumber.

- High order 6 bit fields arecompared to the tag fields of the twoblocks in a set.•Number of blocks per set is a designparameter.

tag

tag

tag

Cache

Mainmemory

Block 0

Block 1

Block 63

Block 64

Block 65

Block 127

Block 128

Block 129

Block 4095

Block 0

Block 1

Block 126

tag

tag

Block 2

Block 3

tagBlock 127

Main memory address

6 6 4

Tag Set Word

Set 0

Set 1

Set 63

Tag: 111011Set: 111111=63, in the 63th set of the cacheWord:1100=12, the 12th word of the 63th set in the cache

Main memory address6 6 4Tag Set Word

111011,111111,1100

Replacement Algorithms

Difficult to determine which blocks to kick outThe cache controller tracks references to all blocks as computationproceeds.Increase / clear track counters when a hit/miss occurs

• For Associative & Set-Associative CacheWhich location should be emptied when the cache is full and amiss occurs?•First In First Out (FIFO)•Least Recently Used (LRU)

Replacement AlgorithmsCPU Reference

A B C A D E A D C F

Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss

CacheFIFO

A AB

ABC

ABC

ABCD

EBCD

EACD

EACD

EACD

EAFD

Replacement AlgorithmsCPU Reference

A B C A D E A D C F

Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss

CacheLRU

A BA

CBA

ACB

DACB

EDAC

AEDC

DAEC

CDAE

FCDA

Virtual Memories

Main memory smaller than address spaceExample: 32-bit address allows an address space of 4G bytes, but

main memory may only be a few hundred megabytes. Parts of program not in main memory are stored on secondary

storage devices, such as disks. Techniques that automatically move program and data blocks into

the physical main memory when they are required for executionare called virtual-memory techniques.

Operating system moves programs and data automatically betweenthe physical main memory and secondary storage (virtualmemory).

Virtual memory organization

Data

Data

DMA transfer

Physical address

Physical address

Virtual address

Disk storage

Main memory

Cache

MMU

Processor

•Memory management unit (MMU) translatesvirtual addresses into physical addresses.•If the desired data or instructions are in themain memory they are fetched as describedpreviously.•If the desired data or instructions are not inthe main memory, they must be transferredfrom secondary storage to the main memory.•MMU causes the operating system to bringthe data from the secondary storage into themain memory.•Virtual addresses will be translated intophysical addresses.•The virtual memory mechanism bridges thesize and speed gaps between the mainmemory and secondary storage – similar tocache

Address translation

Assume that program and data are composed of fixed-length unitscalled pages. A page consists of a block of words that occupycontiguous locations in the main memory.

Page is a basic unit of information that is transferred betweensecondary storage and main memory.

Size of a page commonly ranges from 2K to 16K bytes. Each virtual or logical address generated by a processor is

interpreted as a virtual page number (high-order bits) plus anoffset (low-order bits) that specifies the location of a particularbyte within that page.

Information about the main memory location of each page is kept in thepage table.

Area of the main memory that can hold a page is called as page frame. Starting address of the page table is kept in a page table base register. Page table entry for a page includes:

Address of the page frame where the page resides in the main memory. Some control bits.

Virtual page number generated by the processor is added to the contentsof the page table base register. This provides the address of the corresponding entry in the page table. The contents of this location in the page table give the starting address

of the page if the page is currently in the main memory.

Page frame

Virtual address from processor

in memory

Offset

Offset

Virtual page numberPage table address

Page table base register

Controlbits

Physical address in main memory

PAGE TABLE

Page frame

+

Virtual address isinterpreted as pagenumber and offset.

Page table holdsinformation about eachpage. This includes thestarting address of thepage in the main memory.

PTBR holdsthe address ofthe page table.

PTBR + virtual page number providethe entry of the page in the page table.

Page table entry for a page also includes some control bitswhich describe the status of the page while it is in the mainmemory.

The page table information is used by the MMU for everyaccess, so it is supposed to be with the MMU. whichconsists of the page table entries that correspond to themost recently accessed pages,

A small cache called as Translation Lookaside Buffer (TLB)is included in the MMU. TLB holds page table entries of the most recently

accessed pages. In addition to the above for each page, TLB must hold the

virtual page number for each page.

No

Yes

Hit

Miss

Virtual address from processor

TLB

OffsetVirtual page number

numberVirtual page Page frame

in memoryControl

bits

OffsetPage frame

=?

Physical address in main memory

•High-order bits of the virtualaddress generated by theprocessor select the virtualpage.•These bits are compared tothe virtual page numbers inthe TLB.•If there is a match, a hitoccurs and the correspondingaddress of the page frame isread.•If there is no match, a missoccurs and the page tablewithin the main memory mustbe consulted.

if a program generates an access to a page that is not in the mainmemory, a page fault is occur.

Upon detecting a page fault by the MMU, following actionsoccur: MMU asks the operating system to intervene by raising an

exception. Processing of the active task which caused the page fault is

interrupted. Control is transferred to the operating system. Operating system copies the requested page from secondary

storage to the main memory. Once the page is copied, control is returned to the task which

was interrupted.

Performance considerations A key design objective of a computer system is to achieve the best

possible performance at the lowest possible cost. Price/performance ratio is a common measure of success.

Performance of a processor depends on: How fast machine instructions can be brought into the processor

for execution. How fast the instructions can be executed.

Interleaving Divides the memory system into a number of memory modules.

Each module has its own address buffer register (ABR) and databuffer register (DBR).

Arranges addressing so that successive words in the address spaceare placed in different modules.

When requests for memory access involve consecutive addresses,the access will be to different modules.

Since parallel access to these modules is possible, the averagerate of fetching words from the Main Memory can be increased.

Methods of address layouts

Consecutive words are placed in a module. High-order k bits of a memory address

determine the module. Low-order m bits of a memory address

determine the word within a module. When a block of words is transferred from

main memory to cache, only one module isbusy at a time.

m bits

Address in module MM address

i

k bits

Module Module Module

Module

DBRABR DBRABR ABR DBR

0 n 1- i

k bits

0ModuleModuleModule

Module MM address

DBRABRABR DBRABR DBR

Address in module

2k 1-

m bits

•Consecutive words are located inconsecutive modules.•Consecutive addresses can be locatedin consecutive modules.•While transferring a block of data,several memory modules can be keptbusy at the same time.

Consecutive words in a module Consecutive words in consecutive modules.

Hit Rate and Miss Penalty The success rate in accessing information at various levels of the

memory hierarchy – hit rate / miss rate. A miss causes extra time needed to bring the desired information into

the cache. Hit rate can be improved by increasing block size, while considering

the cache size . Tave=hC+(1-h)M Tave: average access time experienced by the processor h: hit rate M: miss penalty, the time to access information in the main memory C: the time to access information in the cache

Caches on the Processor Chip

On chip vs. off chip Two separate caches for instructions and data or Single cache for

both the advantage of separating caches – parallelism, better

performance Level 1 and Level 2 caches, are used in high performance

processors L1 cache – faster and smaller. Access more than one word

simultaneously and let the processor use them one at a time. Is onthe processor chip

L2 cache – slower and larger. May be implemented externally usingSRAM chips.

Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)Mwhere h is the hit rate, C is the time to access information in cache,M is the time to access information in main memory.

Other Performance EnhancementsWrite buffer Write-through:• Each write operation involves writing to the main memory.• If the processor has to wait for the write operation to be complete, it

slows down the processor.• Processor does not depend on the results of the write operation.• Write buffer can be included for temporary storage of write

requests.• Processor places each write request into the buffer and continues

execution. Write-back:• Block is written back to the main memory when it is replaced.

Prefetching

• New data are brought into the processor when they are firstneeded.

• Processor has to wait before the data transfer is complete.• Prefetch the data into the cache before they are actually needed,

or before a Read miss occurs.• Prefetching can be accomplished through software by including a

special instruction in the machine language of the processor. Inclusion of prefetch instructions increases the length of the

programs.• Prefetching can also be accomplished using hardware: Circuitry that attempts to discover patterns in memory

references and then prefetches according to this pattern.

Lockup-Free Cache

• Prefetching scheme does not work if it stops other accesses to thecache until the prefetch is completed.

• A cache of this type is said to be “locked” while it services amiss.

• Cache structure which supports multiple outstanding misses iscalled a lockup free cache.

• Since only one miss can be serviced at a time, a lockup freecache must include circuits that keep track of all the outstandingmisses.

• Special registers may hold the necessary information about thesemisses.

Secondary Storage

Magnetic Hard Disks

Organization of Data on a Disk

Sector 0, track 0

Sector 3, trackn

Figure 5.30. Organization of one surface of a disk.

Sector 0, track 1

Access Data on a Disk Sector header Following the data, there is an error-correction code (ECC). Formatting process Difference between inner tracks and outer tracks Access time – seek time / rotational delay (latency time) Data buffer/cache

Disk Controller

Processor Main memory

System bus

Figure 5.31. Disks connected to the system bus.

Disk controller

Disk drive Disk drive

Disk Controller Seek Read Write Error checking

RAID Disk Arrays• Redundant Array of Inexpensive Disks• Using multiple disks makes it cheaper for huge storage, and

also possible to improve the reliability of the overall system.• RAID0 – data striping• RAID1 – identical copies of data on two disks• RAID2, 3, 4 – increased reliability• RAID5 – parity-based error-recovery

Optical Disks

CD-ROM CD-Recordable (CD-R) CD-ReWritable (CD-RW) DVD DVD-RAM

Aluminum Acrylic Label

(a) Cross-section

Polycarbonate plastic

Source Detector Source Detector Source Detector

No reflection

Reflection Reflection

Pit Land

0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0

(c) Stored binary pattern

Figure 5.32. Optical disk.

Pit Land

1

(b) Transition from pit to land

Magnetic Tape Systems

Figure 5.33. Organization of data on magnetic tape.

FileFile

markmark

File

7 or 9

gap gapFile gap Record RecordRecord Record

bits••••

••••

Computer organization memory

Education

Transcript of Computer organization memory