1 CSCI 2510 Computer Organization Memory System I Organization.
-
Upload
eleanor-mcdonald -
Category
Documents
-
view
235 -
download
0
Transcript of 1 CSCI 2510 Computer Organization Memory System I Organization.
1
CSCI 2510Computer Organization
Memory System I
Organization
2
Who Cares About the Memory Hierarchy?
Processor60%/yr.(2x/1.5 yr)
DRAM9%/yr.(2x/10 yrs)1
10
100
1000
198
0198
1 198
3198
4198
5 198
6198
7198
8198
9199
0199
1 199
2199
3199
4199
5199
6199
7199
8 199
9200
0
DRAM
CPU198
2
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
Processor-DRAM Memory Performance Gap
Processor Growth Curve follows
3
Memory System Revisited
Maximum size of memory is determined by addressing schemeE.g. 16-bit addresses can only address 216
= 65536 memory locationsMost machines are byte-addressable
each memory address location refers to a byteMost machines retrieve/ store data in wordsCommon abbreviations
1k ~= 210 (kilo)1M ~= 220 (Mega)1G ~= 230 (Giga)1T ~= 240 (Tera)
4
Simplified View
Data transfer takes place through the MAR (memory address register) and MDR (memory data register)
Up to 2 k addressableMDR
MAR
k-bitaddress bus
n-bitdata bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
Several addressablelocations (bytes) are grouped into a word
5
Philosophy of the Big Picture
Processor usually runs much faster than main memorySmall memories are fast, large memories are slowUse a cache memory to store data that is likely to be
used
Main memory is limitedUse virtual memory to increase apparent size of
physical memory by moving unused sections of memory to disk (automatically).
A translation between virtual and physical addresses is done by a memory management unit (MMU)
To be discussed in later lectures
6
Memory Hierarchy of a Modern Computer System
By taking advantage of the principle of locality: Present the user with as much memory as is available in the
cheapest technology. Provide access at the speed offered by the fastest
technology.
Control
Datapath
SecondaryStorage(Disk)
Processor
Registers
MainMemory(DRAM)
SecondLevelCache
(SRAM)
On
-Ch
ipC
ache
~1 ns Tens msSpeed: Tens ns Hundreds ns – 1 us
Hundreds Giga'sSize (bytes): kilo's Mega's
TertiaryStorage(Tape)
Tens sec
Tera's
7
Definitions
Memory Access Time= time between start and finish of a memory request
Memory Cycle Time= minimum delay between successive memory operations
Random Access Memory (RAM)property: comparable access time for any memory locations
8
Random Access Memory (RAM)
RAM types Static RAM (SRAM) are fast but costly, so the capacity is small.
Dynamic RAMs (DRAM) store data as electric charge on a capacitor.Charge leaks away with time so DRAMs must be refreshed. In return for this trouble, the density is much higher.The common type used today is called Synchronous DRAM (SDRAM) as it
uses a clock to synchronize the operation.
Double Data Rate (DDR) SDRAM transfers data on both clock edges (normal SDRAMs only operate once per clock cycle).
DDR-2 (4x basic memory clock) and DDR-3 (8x basic memory clock) are on the market.
Memory modules are used to hold several SDRAM chips and are the standard type used in a computer’s motherboard.
9
Comparing bandwidthTable below compares the theoretical bandwidths of different RAM types.However, SDRAM does not perform as good as the figures shown.This is due to latencies.
RAM Type Theoretical Maximum Bandwidth
SDRAM 100 MHz (PC100) 100 MHz X 64 bit/ cycle = 800 MByte/sec
SDRAM 133 MHz (PC133) 133 MHz X 64 bit/ cycle = 1064 MByte/sec
DDR SDRAM 200 MHz (PC1600) 2 X 100 MHz X 64 bit/ cycle ~= 1600 MByte/sec
DDR SDRAM 266 MHz (PC2100) 2 X 133 MHz X 64 bit/ cycle ~= 2100 MByte/sec
DDR SDRAM 333 MHz (PC2600) 2 X 166 MHz X 64 bit/ cycle ~= 2600 MByte/sec
DDR-2 SDRAM 667 MHz (PC2-5400) 2 X 2 X 166 MHz X 64 bit/ cycle ~= 5400 MByte/sec
DDR-2 SDRAM 800 MHz (PC2-6400) 2 X 2 X 200 MHz X 64 bit/ cycle ~= 6400 MByte/sec
Performance of SDRAM
ClockCycles
1 Hertz is 1 Cycle per second
10
Speed VS Latency
Can you differentiate these two?Peter acts FAST but he's always LATE.Mary is always PUNCTUAL but she is SLOW.
When we say memory speed, we are talking about Memory Cycle Time. It is related to memory bandwidth.
When we say latency, we are talking about Memory Access Time.How is the response.Usually measure in "number of cycles" like CL3
11
Memory Controller A memory controller is normally used to interface between the memory
and the processor.
DRAMs have a slightly more complex interface as they need refreshing and they usually time-multiplex signals to reduce number of pins.
SRAM interfaces are simpler and may not need a memory controller.
Processor
RAS
CAS
R/ W
Clock
Address
Row/Columnaddress
Memorycontroller
R/ W
Clock
Request
CS
Data
Memory
12
Non-Volatile Memories
Read-Only Memory (ROM)Memory content fixed and cannot be changed (easily)
FLASH MemoryCan be read and writtenThis is normally used for non-volatile storageFlash cards are made from FLASH chips
Useful to bootstrap a computer as RAM is volatile (i.e. lost memory) when power removed
13
Memory Hierarchy
Aim: to produce fast, big and cheap memory
L1, L2 cache are usually SRAM
Main memory is DRAM
Relies on locality of reference
Pr ocessor
Primarycache
Secondarycache
Main
Magnetic disk
memory
Increasingsize
Increasingspeed
secondarymemory
Increasingcost per bit
Re gisters
L1
L2
Increasinglatency
14
Locality of Reference
Temporal Locality (locality in time)If an item is referenced, it will tend to be referenced
again soon.When information item (instruction or data) is first
needed, brought it into cache where it will hopefully be used again
Spatial Locality (locality in space)If an item is referenced, neighbourhood items whose
addresses are close-by will tend to be referenced soon.
Rather than a single word, fetch data from adjacent addresses as well.
15
Mix-and-Match: Best of Both
By taking advantage of the principle of locality:Present the user with as much memory as is available
in the cheapest technology.Provide access at the speed offered by the fastest
technology.
DRAM is slow but cheap and dense:Good choice for presenting the user with a BIG memory
system
SRAM is fast but expensive and not very dense:Good choice for providing the user FAST access time.
16
Cache Usage
Need to determine how the cache is organized and "mapped" to the main memory. Mapping functions determine how addresses are assigned to
cache locations
Need to have a replacement algorithm to decide what to do when cache is full.
i.e. decide which item to be unloaded from cache.Definition: Block refers to a set of contiguous addresses of
given size (cache block is also called cache line)
Cache MainMemoryProcessor
17
Cache Operations
Read request:Contents of a block are read into the cache the first time
from the memory
Subsequent accesses are (hopefully) from the cache, called a cache read hit
Number of cache entries is relatively small, need to keep most likely used data in cache
When an un-cached block is required, need to employ a replacement algorithm to remove an old block and to create space for the new
18
Cache Operations
Write operation:Scheme 1: Cache and main memory updated at the
same time (write-through)
Scheme 2: Update cache only and mark the entry dirty. Main memory will be updated later when cache block is removed (write-back)
Which scheme is simpler?Which one has better performance? Why?
Note that read misses and read hits, write misses and write hits can occur
19
Summary
Processor usually runs much faster than main memory
Common RAM types: SRAM, DRAM, SDRAM, DDR SDRAM
Principle of locality: Temporal and SpatialPresent the user with as much memory as is available in the
cheapest technology.Provide access at the speed offered by the fastest
technology.
Memory hierarchy: Register Cache Main Memory Disk Tape