Main Memory by J. Nelson Amaral. CMPUT 229 Types of Memories Read/Write Memory (RWM): the time...
-
date post
22-Dec-2015 -
Category
Documents
-
view
223 -
download
1
Transcript of Main Memory by J. Nelson Amaral. CMPUT 229 Types of Memories Read/Write Memory (RWM): the time...
Main Memory
by J. Nelson Amaral
CMPUT 229
Types of Memories
Read/Write Memory (RWM):
the time required to read orwrite a bit of memory is independent of the bit’s location.
once a word is writtento a location, it remains stored as long as power is appliedto the chip, unless the location is written again.
the data stored ateach location must be refreshed periodically by reading it andthen writing it back again, or else it disappears.
we can store and retrieve data.
Random Access Memory (RAM):
Static Random Access Memory (SRAM):
Dynamic Random Access Memory (DRAM):
CMPUT 329 - Computer Organization and Architecture II 3
Static × Dynamic Memory Cell
Static Memory Cell(6 transistors)
word line
bit line
Dynamic Memory Cell(1 transistor)
CMPUT 329 - Computer Organization and Architecture II 4
Writing 1 in a Dynamic Memories
To store a 1 in this cell, a HIGH voltage is placed onthe bit line, causing the capacitor to charge through the on transistor.
word line
bit line
CMPUT 329 - Computer Organization and Architecture II 5
Writing 0 in a Dynamic Memories
To store a 0 in this cell, a LOW voltage is placed onthe bit line, causing the capacitor to discharge through the on transistor.
word line
bit line
CMPUT 329 - Computer Organization and Architecture II 6
Destructive Reads
To read the DRAM cell, the bit line is precharged to a voltage halfway between HIGH and LOW, and then the word line is set HIGH. Depending on the charge in the capacitor, the precharged bit line is pulled slightly higher or lower.A sense amplifier detects this small change and recovers a 1 or a 0.
word line
bit line
CMPUT 329 - Computer Organization and Architecture II 7
Recovering from Destructive Reads
The read operation discharges the capacitor.Therefore a read operation in a dynamic memory mustbe immediately followed by a write operation of the samevalue read to restore the capacitor charges.
word line
bit line
CMPUT 329 - Computer Organization and Architecture II 8
Forgetful Memories
The problem with this cell is that it is not bi-stable:only the state 0 can be kept indefinitely, when the cell is in state 1, the charge stored in the capacitorslowly dissipates and the data is lost.
word line
bit line
CMPUT 229
Refreshing the Memory: Why DRAMs are Dynamic
Vcap
0V
HIGHLOW
VCC
time
0 stored
1 written refreshes
The solution is to periodically refresh the memorycells by reading and writing back each one of them.
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
DOUT3 DOUT2 DOUT1 DOUT0
3-to-8decoder
2
1
0
A2
A1
A0
0
1
2
3
4
5
6
7
DIN3 DIN0DIN2 DIN1
WE_LCS_L
OE_L
WR_L
IOE_L
0
1
1
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
DOUT3 DOUT3 DOUT3 DOUT3
3-to-8decoder
2
1
0
A2
A1
A0
0
1
2
3
4
5
6
7
DIN3 DIN3DIN3 DIN3
WE_LCS_L
OE_L
WR_L
IOE_L
0
1
1
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
DOUT3 DOUT3 DOUT3 DOUT3
3-to-8decoder
2
1
0
A2
A1
A0
0
1
2
3
4
5
6
7
DIN3 DIN3DIN3 DIN3
WE_LCS_L
OE_L
WR_L
IOE_L
0
1
1
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
DOUT3 DOUT3 DOUT3 DOUT3
3-to-8decoder
2
1
0
A2
A1
A0
0
1
2
3
4
5
6
7
DIN3 DIN3DIN3 DIN3
WE_LCS_L
OE_L
WR_L
IOE_L
0
1
1
CMPUT 229
Bi-directional Data Bus
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
IN OUTSELWR
DIO3 DIO2 DIO1 DIO0
WE_LCS_L
OE_L
WR_L
IOE_L
microprocessor
CMPUT 229
DRAM High Level View
Cols
Rows
0 1 2 3
0
1
2
3
Internal row buffer
DRAM chip
addr
data
2/
8/
Memorycontroller
(to CPU)
Byant/O’Hallaron, pp. 459
CMPUT 229
DRAM RAS Request
RAS = 2
Cols
Rows
0 1 2 3
0
1
2
3
Internal row buffer
DRAM chip
Row 2
addr
data
2/
8/
Memorycontroller
RAS = Row Address StrobeByant/O’Hallaron, pp. 460
CMPUT 229
DRAM CAS Request
Supercell (2,1)
Cols
Rows
0 1 2 3
0
1
2
3
Internal row buffer
DRAM chip
CAS = 1
addr
data
2/
8/
Memorycontroller
CAS = Column Address StrobeByant/O’Hallaron, pp. 460
Memory Modules: Supercell (i,j)
031 78151623243263 394047485556
64-bit double word at main memory address A
addr (row = i, col = j)
data
64 MB memory module
consisting of8 8Mx8 DRAMs
Memorycontroller
bits0-7
DRAM 7
DRAM 0
bits8-15
bits16-23
bits24-31
bits32-39
bits40-47
bits48-55
bits56-63
64-bit doubleword to CPU chip
Byant/O’Hallaron, pp. 461
Step 1: Apply row address
1
Step 2: RAS go from high to low and remain low2
Step 4: WE must be high
4
Step 3: Apply column address
3Step 5: CAS goes from high to low and remain low
5
Step 6: OE goes low
6
Step 7: Data appears
7
Step 8: RAS and CAS return to high
8
Read Cycle on an Asynchronous DRAM
CMPUT 229
Improved DRAMs
Central Idea: Each read to a DRAM actuallyreads a complete row of bits or word line fromthe DRAM core into an array of sense amps.
A traditional asynchronous DRAM interfacethen selects a small number of these bits to bedelivered to the cache/microprocessor.
All the other bits already extracted from the DRAMcells into the sense amps are wasted.
CMPUT 229
Fast Page Mode DRAMs
In a DRAM with Fast Page Mode, a page is defined asall memory addresses that have the same row address.
To read in fast page mode, all the steps from 1 to 7 ofa standard read cycle are performed.
Then OE and CAS are switched high, but RAS remains low.
Then the steps 3 to 7 (providing a new column address,asserting CAS and OE) are performed for each newmemory location to be read.
A Fast Page Mode Read Cycle on an Asynchronous DRAM
CMPUT 229
Enhanced Data Output RAMs (EDO-RAM)
The process to read multiple locations in an EDO-RAMis very similar to the Fast Page Mode.
The difference is that the output drivers are not disabledwhen CAS goes high.
This distinction allows the data from the current read cycleto be present at the outputs while the next cyclebegins.
As a result, faster read cycle times are allowed.
An Enhanced Data Output Read Cycle on an Asynchronous DRAM
CMPUT 229
Synchronous DRAMs (SDRAM)
A Synchronous DRAM (SDRAM) has a clock input. It operatesin a similar fashion as the fast page mode and EDO DRAM.However the consecutive data is output synchronously on thefalling/rising edge of the clock, instead of on command byCAS.
How many data elements will be output (the length of the burst) is programmable up to the maximum size ofthe row.
The clock in an SDRAM typically runs oneorder of magnitude faster than the access time forindividual accesses.
CMPUT 229
DDR SDRAM
A Double Data Rate (DDR) SDRAM is an SDRAMthat allows data transfers both on the rising andfalling edge of the clock.
Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate ofa standard SDRAM with the same clock frequency.
P-H 473
A Quad Data Rate (QDR) SDRAM doubles the data transfer rate again by separating the input and output of a DDR SDRAM.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27
Main Memory Supporting Caches
• Use DRAMs for main memory– Fixed width (e.g., 1 word)– Connected by fixed-width clocked bus
• Bus clock is typically slower than CPU clock
P-H 471
Improving Memory Bandwidth
Baer p. 248
SIMM × DIMM
SIMM ≡ Single InlineMemory Module
DIMM ≡ Dual InlineMemory Module
Uses two edges of the physicalconnector → twice as many connections to the chip
Memory System Example
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30
Cache
Memory Bus
Memory
1 bus cycle for address transfer15 bus cycles per DRAM access1 bus cycle per data transfer
4-word cache block
1-word wide DRAMMiss penalty = 1 + 4×15 + 4×1 = 65 bus cyclesBandwidth = 16 bytes / 65 cycles = 0.25 byte/cycle
P-H 471
Example: Wider Memory
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31
Cache
Memory Bus
Memory
1 bus cycle for address transfer15 bus cycles per DRAM access1 bus cycle per data transfer
4-word cache block
Miss penalty = 1 + 15 + 1 = 17 bus cyclesBandwidth = 16 bytes / 17 cycles = 0.94 byte/cycle
Wider bus/memories are costly!
P-H 471
4-word wide DRAM
Example: Interleaved Memory
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32
Cache
Memory Bus
4-word cache block
Miss penalty = 1 + 15 + 4×1 = 20 bus cyclesBandwidth = 16 bytes / 20 cycles = 0.8 byte/cycle
BankBankBankBank
1 bus cycle for address transfer15 bus cycles per DRAM access1 bus cycle per data transfer
P-H 471
Split –Transaction Bus
Issue: Memory should not hold the processor-memory bus while it fetches the data to its buffers.
Solution: Split-transaction bus
Phase 1: Processor sends address and operation type to bus, then releases the bus
Phase 3: Memory controller requests the bus Memory sends the data into the bus. Release the bus
Phase 2: Memory fetches data into its buffers.
Example (load):
Phase 1 for access A canbe in parallel with Phase 2for access B
Baer p. 250
Bank Interleaving and Cache Indexing
CacheTag
CacheIndex
CacheDispl.
BankIndex
Line Interleaving
Page Interleaving
PageIndex
PageOffset
Issue: In both cases,cache Indexoverlaps Bank Index
⇒ on a miss, the missing line is in thesame bank as thereplaced line.
⇒ full penalty for precharge, row andcolumn access
Baer p. 249
Bank Interleaving and Cache Indexing
Solution: bank rehash by XORing the k bits of the bank index with k bits of the tag.
Baer p. 250
Memory Controller
Transactions do not need to be processed in order.
Intelligent controllers optimize accesses by reorderingtransactions.
Baer p. 250
Memory Controller
IpekISCA2008 p. 40
Why the controller’s job is difficult?
1. Must obey more than 50 timing constraints
2. Must prioritize requests to optimize performance
Scheduling decisions have long-term consequence:
Future requests depends on which request is served first (which instruction is unblocked).
Benefit of a scheduling decision depends onfuture processor behavior.
Reinforcement-Learning Controller
IpekISCA2008 p. 41
Reinforcement-Learning Controller
IpekISCA2008 p. 42
Reinforcement Learning Controller Performance
4-core system
Peak BW: 6.4 GB/s
In-Order FR-FCFS RL Optimistic
26% 46% 56% 80%Bus Utilization:
IpekISCA2008 p. 42
Online RL is better than offline RL
IpekISCA2008 p. 48
RambusNarrow and fast buses.Split transactionsSeparate row and columncontrol lines
16 internal banks
400 MHz --- 1.6 GB/s
Introduced in 1997SDRAMs were at 100 MHzand had a peak of 0.4 GB/s
2010: 64-bit DDR DRAMs at 133 MHz same peak⇒
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43
DRAM Generations
Year Capacity $/GB
1980 64Kbit $1500000
1983 256Kbit $500000
1985 1Mbit $200000
1989 4Mbit $50000
1992 16Mbit $15000
1996 64Mbit $10000
1998 128Mbit $4000
2000 256Mbit $1000
2004 512Mbit $250
2007 1Gbit $50
P-H 474
Access time to a new row/column (ns)
Year