1 CSCI 2510 Computer Organization Memory System I Organization.

1

CSCI 2510Computer Organization

Memory System I

Organization

2

Who Cares About the Memory Hierarchy?

Processor60%/yr.(2x/1.5 yr)

DRAM9%/yr.(2x/10 yrs)1

10

100

1000

198

0198

1 198

3198

4198

5 198

6198

7198

8198

9199

0199

1 199

2199

3199

4199

5199

6199

7199

8 199

9200

0

DRAM

CPU198

2

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

Processor-DRAM Memory Performance Gap

Processor Growth Curve follows

3

Memory System Revisited

Maximum size of memory is determined by addressing schemeE.g. 16-bit addresses can only address 216

= 65536 memory locationsMost machines are byte-addressable

each memory address location refers to a byteMost machines retrieve/ store data in wordsCommon abbreviations

1k ~= 210 (kilo)1M ~= 220 (Mega)1G ~= 230 (Giga)1T ~= 240 (Tera)

4

Simplified View

Data transfer takes place through the MAR (memory address register) and MDR (memory data register)

Up to 2 k addressableMDR

MAR

k-bitaddress bus

n-bitdata bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

Several addressablelocations (bytes) are grouped into a word

5

Philosophy of the Big Picture

Processor usually runs much faster than main memorySmall memories are fast, large memories are slowUse a cache memory to store data that is likely to be

used

Main memory is limitedUse virtual memory to increase apparent size of

physical memory by moving unused sections of memory to disk (automatically).

A translation between virtual and physical addresses is done by a memory management unit (MMU)

To be discussed in later lectures

6

Memory Hierarchy of a Modern Computer System

By taking advantage of the principle of locality: Present the user with as much memory as is available in the

cheapest technology. Provide access at the speed offered by the fastest

technology.

Control

Datapath

SecondaryStorage(Disk)

Processor

Registers

MainMemory(DRAM)

SecondLevelCache

(SRAM)

On

-Ch

ipC

ache

~1 ns Tens msSpeed: Tens ns Hundreds ns – 1 us

Hundreds Giga'sSize (bytes): kilo's Mega's

TertiaryStorage(Tape)

Tens sec

Tera's

7

Definitions

Memory Access Time= time between start and finish of a memory request

Memory Cycle Time= minimum delay between successive memory operations

Random Access Memory (RAM)property: comparable access time for any memory locations

8

Random Access Memory (RAM)

RAM types Static RAM (SRAM) are fast but costly, so the capacity is small.

Dynamic RAMs (DRAM) store data as electric charge on a capacitor.Charge leaks away with time so DRAMs must be refreshed. In return for this trouble, the density is much higher.The common type used today is called Synchronous DRAM (SDRAM) as it

uses a clock to synchronize the operation.

Double Data Rate (DDR) SDRAM transfers data on both clock edges (normal SDRAMs only operate once per clock cycle).

DDR-2 (4x basic memory clock) and DDR-3 (8x basic memory clock) are on the market.

Memory modules are used to hold several SDRAM chips and are the standard type used in a computer’s motherboard.

9

Comparing bandwidthTable below compares the theoretical bandwidths of different RAM types.However, SDRAM does not perform as good as the figures shown.This is due to latencies.

RAM Type Theoretical Maximum Bandwidth

SDRAM 100 MHz (PC100) 100 MHz X 64 bit/ cycle = 800 MByte/sec

SDRAM 133 MHz (PC133) 133 MHz X 64 bit/ cycle = 1064 MByte/sec

DDR SDRAM 200 MHz (PC1600) 2 X 100 MHz X 64 bit/ cycle ~= 1600 MByte/sec



DDR-2 SDRAM 667 MHz (PC2-5400) 2 X 2 X 166 MHz X 64 bit/ cycle ~= 5400 MByte/sec

DDR-2 SDRAM 800 MHz (PC2-6400) 2 X 2 X 200 MHz X 64 bit/ cycle ~= 6400 MByte/sec

Performance of SDRAM

ClockCycles

1 Hertz is 1 Cycle per second

10

Speed VS Latency

Can you differentiate these two?Peter acts FAST but he's always LATE.Mary is always PUNCTUAL but she is SLOW.

When we say memory speed, we are talking about Memory Cycle Time. It is related to memory bandwidth.

When we say latency, we are talking about Memory Access Time.How is the response.Usually measure in "number of cycles" like CL3

11

Memory Controller A memory controller is normally used to interface between the memory

and the processor.

DRAMs have a slightly more complex interface as they need refreshing and they usually time-multiplex signals to reduce number of pins.

SRAM interfaces are simpler and may not need a memory controller.

Processor

RAS

CAS

R/ W

Clock

Address

Row/Columnaddress

Memorycontroller

R/ W

Clock

Request

CS

Data

Memory

12

Non-Volatile Memories

Read-Only Memory (ROM)Memory content fixed and cannot be changed (easily)

FLASH MemoryCan be read and writtenThis is normally used for non-volatile storageFlash cards are made from FLASH chips

Useful to bootstrap a computer as RAM is volatile (i.e. lost memory) when power removed

13

Memory Hierarchy

Aim: to produce fast, big and cheap memory

L1, L2 cache are usually SRAM

Main memory is DRAM

Relies on locality of reference

Pr ocessor

Primarycache

Secondarycache

Main

Magnetic disk

memory

Increasingsize

Increasingspeed

secondarymemory

Increasingcost per bit

Re gisters

L1

L2

Increasinglatency

14

Locality of Reference

Temporal Locality (locality in time)If an item is referenced, it will tend to be referenced

again soon.When information item (instruction or data) is first

needed, brought it into cache where it will hopefully be used again

Spatial Locality (locality in space)If an item is referenced, neighbourhood items whose

addresses are close-by will tend to be referenced soon.

Rather than a single word, fetch data from adjacent addresses as well.

15

Mix-and-Match: Best of Both

By taking advantage of the principle of locality:Present the user with as much memory as is available

in the cheapest technology.Provide access at the speed offered by the fastest

technology.

DRAM is slow but cheap and dense:Good choice for presenting the user with a BIG memory

system

SRAM is fast but expensive and not very dense:Good choice for providing the user FAST access time.

16

Cache Usage

Need to determine how the cache is organized and "mapped" to the main memory. Mapping functions determine how addresses are assigned to

cache locations

Need to have a replacement algorithm to decide what to do when cache is full.

i.e. decide which item to be unloaded from cache.Definition: Block refers to a set of contiguous addresses of

given size (cache block is also called cache line)

Cache MainMemoryProcessor

17

Cache Operations

Read request:Contents of a block are read into the cache the first time

from the memory

Subsequent accesses are (hopefully) from the cache, called a cache read hit

Number of cache entries is relatively small, need to keep most likely used data in cache

When an un-cached block is required, need to employ a replacement algorithm to remove an old block and to create space for the new

18

Cache Operations

Write operation:Scheme 1: Cache and main memory updated at the

same time (write-through)

Scheme 2: Update cache only and mark the entry dirty. Main memory will be updated later when cache block is removed (write-back)

Which scheme is simpler?Which one has better performance? Why?

Note that read misses and read hits, write misses and write hits can occur

19

Summary

Processor usually runs much faster than main memory

Common RAM types: SRAM, DRAM, SDRAM, DDR SDRAM

Principle of locality: Temporal and SpatialPresent the user with as much memory as is available in the

cheapest technology.Provide access at the speed offered by the fastest

technology.

Memory hierarchy: Register Cache Main Memory Disk Tape

1 CSCI 2510 Computer Organization Memory System I Organization.

Documents

Transcript of 1 CSCI 2510 Computer Organization Memory System I Organization.