Post on 19-Dec-2015
Computer Architecture 2011 – peripherals1
Computer Architecture
Peripherals
By Dan Tsafrir, 6/6/2011Presentation based on slides by Lihu Rappoport
Computer Architecture 2011 – peripherals2
MEMORY: REMINDER
Computer Architecture 2011 – peripherals3
Not so long ago…
1
10
100
1000
19
80
19
81
19
82
19
83
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
DRAM
CPU
Pe
rfo
rma
nc
e
Time
DRAM
9% per yr
2X in 10 yrs
CPU60% per yr2X in 1.5 yrs
Gap grew 50% per year
Computer Architecture 2011 – peripherals4
Not so long ago… In 1994, in their paper
“Hitting the Memory Wall: Implications of the Obvious”,
William Wulf & Sally McKee said:
“We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed – each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs.
The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.”
Computer Architecture 2011 – peripherals5
More recently (2008)…lo
wer
= s
low
er
Fast
Slow
The memory wall in the multicore era
Perf
orm
an
ce (
secon
ds)
Processor cores
Conventionalarchitecture
Computer Architecture 2011 – peripherals6
Memory Trade-Offs Large (dense) memories are slow Fast memories are small, expensive and consume high
power Goal: give the processor a feeling that it has a memory
which is large (dense), fast, consumes low power, and cheap
Solution: a Hierarchy of memories
Speed: Fastest SlowestSize: Smallest BiggestCost: Highest LowestPower: Highest Lowest
L1CacheCPU
L2Cache
L3Cache
Memory(DRAM)
Computer Architecture 2011 – peripherals7
Typical levels in mem hierarchy
Response time Size Memory level
≈ 0.5 ns ≈ 100 bytes CPU registers
≈ 1 ns ≈ 64 KB L1 cache
≈ 15 ns ≈ 1 – 4 MB L2 cache
≈ 150 ns ≈ 1 – 4 GB Main memory (DRAM)
≈ 15 ms ≈ 1 – 2 TB Hard disk (SATA)
Computer Architecture 2011 – peripherals8
DRAM & SRAM
Computer Architecture 2011 – peripherals9
DRAM basics DRAM
Dynamic random-access memory Random access = access cost the same (well, not really)
CPU thinks of DRAM as 1-dimensional Simpler
But DRAM is actually arranged as a 2-D grid Need row & col addresses to access Given “1-D address”, DRAM interface splits it to row &
col Some time duration must elapse between row & col
access(10s of ns)
Computer Architecture 2011 – peripherals10
DRAM basics Why 2D? Why delayed row & col accesses?
Every address-bit requires a physical pin DRAMs are large (GBs nowadays)
=> need many pins => more expensive
A DRAM array has Row decoder
• Extracts row number from memory address Column decoder
• Extracts column number from memory address Sense amplifiers
• Hold row when (1) written to, (2) read from, (3) is refreshed (see next slide)
Computer Architecture 2011 – peripherals11
DRAM basics Use one transistor-capacitor pair
Per bit
Capacitors leaks => Need to be refreshed every few ms
DRAM spends ~1% of time in refreshing “Opening” a row = fetching it to sense amplifiers = refreshing it
Is it worth it to make DRAM a rectangle (rather than square?)
Computer Architecture 2011 – peripherals12
x1 DRAM
Data in/out
buffers
Senseamplifiers
Memoryarray
Column decoder
Ro
wd
eco
der
…columns……rows…
one bit
Computer Architecture 2011 – peripherals13
DRAM banks Each DRAM memory array outputs one bit
DRAMs use multiple arrays to output multiple bits at a time x N indicates DRAM with N memory arrays Typical today: x16, x32
Each collection of x N arrays forms a DRAM bank Can read/write from/to each bank independently
Computer Architecture 2011 – peripherals14
x4 DRAM
one bit
…row…
…columns…
Data in/out
buffers
Senseamplifiers
Memoryarray
Column decoder
Ro
wd
eco
der
…row…
…columns…
Data in/out
buffers
Senseamplifiers
Memoryarray
Column decoder
Ro
wd
eco
der
…row…
…columns…
Data in/out
buffers
Senseamplifiers
Memoryarray
Column decoder
Ro
wd
eco
der
…rows…
…columns…
Data in/out
buffers
Senseamplifiers
Memoryarray
Column decoder
Ro
wd
eco
der
Computer Architecture 2011 – peripherals15
Ranks & DIMMs DIMM
(Dual in-line) memory module (the unit we connect to the MB)
Increase bandwidth by delivering data from multiple banks Bandwidth by one bank is limited => Put multiple banks on DIMM Bus has higher clock frequency than any one DRAM Bus controls switches between banks to achieve high
data rate
Increase capacity by utilizing multiple ranks Each rank is an independent set of banks that can be
accessed for the full data bit‐width, • 64 bits for non-ECC; 72 for ECC (error correction code)
Ranks cannot be accessed simultaneously• As they share the same data path
Computer Architecture 2011 – peripherals16
Ranks & DIMMs
1GB 2Rx8 (= 2ranks x 8 banks)
Computer Architecture 2011 – peripherals17
Modern DRAM organization
A system has multiple DIMMs
Each DIMM has multiple DRAM banks Arranged in one or more ranks
Each bank has multiple DRAM arrays
Concurrency in banks increases memory bandwidth
Computer Architecture 2011 – peripherals18
Memory controllerM
em
ory
con
tro
ller
address/command bus
data bus
chip select 1
address/command bus
data bus
chip select 2
Computer Architecture 2011 – peripherals19
Memory controller Functionality
Executes processor memory requests
In earlier systems Separate off-processor chip
In modern systems Integrated on-chip with the processor
Interconnect with processor Bus, but can be point-to-point, or through crossbar
Computer Architecture 2011 – peripherals20
Lifetime of a memory access1. Processor orders & queues memory requests2. Request(s) sent to memory controller3. Controller queues & orders requests4. For each request in queue, when the time is right
1. Controller waits until requested DRAM ready2. Controller breaks address bits into rank, bank, row,
column fields3. Controller sends chip-select signal to select rank4. Selected bank pre-charged to activate selected row5. Activate row within selected DRAM bank
• Use “RAS” (row-address strobe signal)6. Send (entire) row to sense amplifiers7. Select desired column
• Use “CAS” (column-address strobe signal)8. Send data back
Computer Architecture 2011 – peripherals21
Basic DRAM array
· Timing (2 phases)· Decode row address + RAS assert· Wait for “RAS to CAS delay”· Decode column address + CAS assert· Transfer DATA
Row latch
Row addressdecoder
Column addrdecoder
Column latchCAS#
RAS# Data
Memoryarray
Memory address bus
Addr
Computer Architecture 2011 – peripherals22
DRAM timing CAS Latency
Number of clock cycles to access a specific column of data
From moment the memory controller issues a column in the current row until data is read out from memory
RAS to CAS delay Number of cycles between row and column access
Row pre-charge time Number of cycles to close the opened-row & to open
next-row
Computer Architecture 2011 – peripherals23
Addressing sequence
· Access sequence· Put row address on data bus and assert RAS#
· Wait for RAS# to CAS# delay (tRCD)· Put column address on data bus and assert CAS# · DATA transfer· Pre-charge
access time
RAS/CAS delay
precharge delay
RAS#
Data
A[0:7]
CAS#
Data n
Row i Col n Row jX
CAS latency
X
Computer Architecture 2011 – peripherals24
· Paged Mode DRAM– Multiple accesses to different columns from same row (special locality)– Saves time it takes to bring a new row (but might be unfair)
· Extended Data Output RAM (EDO RAM)– A data output latch enables to parallelize next column address with
current column data
Improved DRAM Schemes
RAS#
Data
A[0:7]
CAS#
Data n D n+1
Row X Col n X Col n+1 X Col n+2 X
D n+2
X
RAS#
Data
A[0:7]
CAS#
Data n Data n+1
Row X Col n X Col n+1 X Col n+2 X
Data n+2
X
Computer Architecture 2011 – peripherals25
· Burst DRAM– Generates consecutive column address by itself
Improved DRAM Schemes (cont)
RAS#
Data
A[0:7]
CAS#
Data n Data n+1
Row X Col n X
Data n+2
X
Computer Architecture 2011 – peripherals26
Synchronous DRAM (SDRAM) Asynchrony in DRAM
Due to RAS & CAS arriving at any time
Synchronous DRAM Uses clock to deliver requests at regular intervals More predictable DRAM timing => Less skew => Faster turnaround
SDRAMs support burst-mode access Initial performance similar to BEDO (=burst +EDO) Clock scaling enabled higher transfer rates later
• => DDR SDRAM => DDR2 => DDR3
Computer Architecture 2011 – peripherals27
DRAM vs. SRAM
(Random access = access time the same for all locations)
DRAM – Dynamic RAM SRAM – Static RAM
Refresh Yes (~1% time) No
Address Address muxed: row+col
Address not multiplexed
Random Access Not really… Yes
density High (1 Transistor/bit) Low (6 Transistor/bit)
Power low high
Speed slow fast
Price/bit low high
Typical usage Main memory cache