Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
-
Upload
hsien-hsin-lee -
Category
Devices & Hardware
-
view
1.229 -
download
3
Transcript of Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
ECE 4100/6100Advanced Computer Architecture
Lecture 11 DRAM and Storage
Prof. Hsien-Hsin Sean LeeSchool of Electrical and Computer EngineeringGeorgia Institute of Technology
2
The DRAM Cell
• Why DRAMs– Higher density than SRAMs
• Disadvantages– Longer access times – Leaky, needs to be refreshed– Cannot be easily integrated with CMOS
Stack capacitor (vs. Trench capacitor)Source: Memory Arch Course, Insa. Toulouse
Word Line (Control)
Storage CapacitorBit Line
(Information)
1T1C DRAM cell
3
One DRAM Bank
wordline
bitlines
Sense ampsI/O gating
Row
decoder
Column decoder
Data out
Address
Sense ampsSense ampsSense amps
4
Column decoderColumn decoderColumn decoder
Row
decoder
Row
decoder
Row
decoder
Example: 512Mb 4-bank DRAM (x4)
I/O gating
Row
decoder
Column decoder
Data out D[3:0]
Address
A[13:0]
A[10:0]
Address Multiplexing
16K
2k
A x4 DRAM chip
A DRAM page = 2kx4 = 1KB
BA[1:0]
Bank016384 x 2048 x 4
Sense amps
5
DRAM Cell ArrayWordline0 Wordline1 Wordline2 Wordline1023
bitline0
bitline1
bitline2
bitline15
Wordline3
6
DRAM Sensing (Open Bitline Array)WL0 WL1 WL2 WL127
A DRAM Subarry
WL128 WL129 WL130 WL255
A DRAM Subarry
SenseSenseAmpAmp
8
DRAM Basics• Address multiplexing
– Send row address when RAS asserted – Send column address when CAS asserted
• DRAM reads are self-destructive– Rewrite after a read
• Memory array– All bits within an array work in unison
• Memory bank– Different banks can operate independently
• DRAM rank– Chips inside the same rank are accessed
simultaneously
9
Examples of DRAM DIMM Standards
D0
D7
x8
D8
D15
x8D
16
D23
x8
D24
D31
x8
D32
D39
x8
D40
D47
x8
D48
D55
x8
D56
D63
x8
x64 (No ECC)
D0
D7
x8
D8
D15
x8
CB
0
CB
7
x8
D16
D23
x8
D24
D31
x8
D32
D39
x8D
40
D47
x8
D48
D55
x8
X72 (ECC)D
56
D63
x8
10
DRAM Ranks
x8 x8 x8 x8 x8 x8 x8 x8x8 x8 x8 x8 x8 x8 x8 x8
D0
D7
D8
D15
D16
D23
D24
D31
D32
D39
D40
D47
D48
D55
D56
D63
CS1
CS0
Mem
ory
Con
trolle
r Rank0Rank0Rank1Rank1
11
DRAM RanksSingle Rank
8b 8b 8b 8b 8b 8b 8b 8b
64b
Single Rank
4b 4b 4b 4b 4b 4b 4b 4b
64b
4b 4b 4b 4b 4b 4b 4b 4b
Dual-Rank
8b 8b 8b 8b 8b 8b 8b 8b
64b
64b
8b 8b 8b 8b 8b 8b 8b 8b
12
DRAM Organization
Source: Memory Systems Architecture Course, B. Jacobs, Maryland
13
Organization of DRAM Modules
Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland
MemoryController
Addr and Cmd Bus
Data Bus
Channel
Multi-Banked DRAM Chip
14
DRAM Configuration ExampleSource: MICRON DDR3 DRAM
15
MemoryController
DRAM Module
Addr Bus
WECASRAS
Assert RAS
Row Address
Row OpenedData Bus
Column Address
Assert CAS
DRAM Access (Non Nibble Mode)RAS
CAS
ADDR
DATA
Row Addr
Col Addr
Data
Col Addr
Data
16
DRAM Refresh• Leaky storage • Periodic Refresh across DRAM rows • Un-accessible when refreshing• Read, and write the same data back
• Example: – 4k rows in a DRAM– 100ns read cycle– Decay in 64ms
– 4096*100ns = 410s to refresh once– 410s / 64ms = 0.64% unavailability
17
DRAM Refresh Styles• Bursty
64ms
410s =(100ns*4096) 410s
64ms
• Distributed
64ms
15.6s
64ms
100ns
18
• RAS-Only Refresh
• CAS-Before-RAS (CBR) Refresh
MemoryController
DRAM Module
DRAM Module
MemoryController
Addr Bus
WECASRAS
Addr Bus
WE#
CASRAS
Assert RAS
Row Address
Refresh Row
Assert RAS
Refresh Row
Assert CASWE High
Increment counter
DRAM Refresh Policies
Addr counter
No address involved
19
Types of DRAM• Asynchronous DRAM
– Normal: Responds to RAS and CAS signals (no clock)– Fast Page Mode (FPM): Row remains open after RAS for multiple
CAS commands – Extended Data Out (EDO): Change output drivers to latches. Data
can be held on bus for longer time– Burst Extended Data Out: Internal counter drives address latch.
Able to provide data in burst mode.
• Synchronous DRAM– SDRAM: All of the above with clock. Adds predictability to DRAM
operation– DDR, DDR2, DDR3: Transfer data on both edges of the clock– FB-DIMM: DIMMs connected using point to point connection instead
of bus. Allows more DIMMs to be incorporated in server based systems
• RDRAM– Low pin count
20
Disk Storage
21
Disk Organization
Platters
A track
A sector
A cylinder
(1 to 12)
(5000 to 30000)
(100 to 500)
512 Bytes
3600 to 15000 RPM
22
Disk OrganizationRead/write Head (10s of nanometers above magnetic surface)
Arm
23
Disk Access Time• Seek time
– Move the arm to the desired track– 5ms to 12ms
• Rotation latency (or delay)– For example, average rotation latency for a 10,000
RPM disk is 3ms (=0.5/(10,000/60))• Data transfer latency (or throughput)
– Some tens of hundreds of MB per second– E.g., Seagate Cheetah 15K.6 sustained 164MB/sec
• Disk controller overhead
• Use Disk cache (or cache buffer) to exploit locality– 4 to 32MB today– Come with the embedded controller in the HDD
24
Reliability, Availability, Dependability• Program faults
25
Reliability, Availability, Dependability• Program faults• Static Permanent faults
– Design flaw • FDIV ~500
million$– Manufacturing
• Stuck-at-faults• Process variability
• Dynamic faults– Soft errors– Noise-induced– Wear-out
26
Solution Space • DRAM / SRAM
– Use ECC (SECDED)
• Disks– Use redundancy
• User’s backup• Disk arrays
27
RAID• Reliability and Performance consideration• Redundant Array of Inexpensive Disks• Combine multiple small, inexpensive disk
drives• Break arrays into “reliability groups”• Data are divided and replicated across
multiple disk drives• RAID-0 to RAID-5
• Hardware RAID– Dedicated HW controller
• Software RAID– Implemented in the OS
28
Basic Principles• Data mirroring
• Data striping
• Error correction code
29
RAID-1
• Mirrored disks• Most expensive (100% overhead)• Every write to disk also writes to the check disk• Can improve read/seek performance with sufficient number of
controllers
A4A3A2A1A0
A4A3A2A1A0
Disk 0(Data Disk)
Disk 1(Check Disk)
30
RAID-10
• Combine data striping atop of RAID-1
B5B2A3A0
B5B2A3A0
Data Disk 0
Data Disk 1
C0B3B0A1
Data Disk 2
C0B3B0A1
Data Disk 3
B4B1A2
Data Disk 4
B4B1A2
Data Disk 5
31
RAID-2
• Bit-interleaving striping• Use Hamming Code to generate and store ECC on check disks
(e.g., Hamming(7,4))– Space: 4 data disks need 3 check disks (75%), 10 data disks need 4
check disks (40% overhead), 25 data disks need 5 check disks (20%)– CPU needs more compute power to generate Hamming code than
parity• Complex controller• Not really used today!
D0C0B0A0
D1C1B1A1
Data Disk 0
Data Disk 1
D2C2B2A2
Data Disk 2
D3C3B3A3
Data Disk 3
dECC0cECC0bECC0aECC0
Check Disk 0
dECC1cECC1bECC1aECC1
CheckDisk 1
dECC2cECC2bECC2aECC2
CheckDisk 2
32
RAID-3
• Byte-level striping• Use XOR parity to generate and store
parity code on the check disk• At least 3 disks: 2 data disks + 1 check
disk
D0C0B0A0
D1C1B1A1
Data Disk 0
Data Disk 1
D2C2B2A2
Data Disk 2
D3C3B3A3
Data Disk 3
ECCdECCcECCbECCa
Check Disk 0
OneTransfer
Unit
33
RAID-4
• Block-level striping• Keep each individual accessed unit in one disk
– Do not access all disks for (small) transfers– Improved parallelism
• Use XOR parity to generate and store parity code on the check disk• Check info is calculated over a piece of each transfer unit• Small read one read on one disk• Small write two reads and two writes (data and check disks)
– New parity = (old data new data) old parity – No need to read B0, C0, and D0 when read-modify-write A0
• Write is the bottlenecks as all writes access the check disk
A3A2A1A0
B3B2B1B0
Data Disk 0
Data Disk 1
C3C2C1C0
Data Disk 2
D3D2D1D0
Data Disk 3
ECC3ECC2ECC1ECC0
Check Disk 0
RAID-3 vs. RAID-4
34
NewD0 D0 D1 D2 D3 P
New D0 D1 D2 D3 New
P
XOR
NewD0 D0 D1 D2 D3 P
NewD0 D1 D2 D3 New
P
XOR XOR
RAID-3
RAID-4
35
E3D3B3ECC2 C3ECC3 C2
ECC4D2D1
ECC0
RAID-5
• Block-level striping• Distributed parity to enable write parallelism.
Remove bottleneck of accessing parity• Example: write “sector A” and write “sector B” can
be performed simultaneously
A3A2A1A0
E2B2B1B0
Data Disk 0
Data Disk 1
E1C1C0
Data Disk 2
E0D0
Data Disk 3
ECC1
Data Disk 4
36
ECC4qD2D1D0
E2B2A2 ECC4pECC3p ECC3q
C2
RAID-6
• Similar to RAID-5 with “dual distributed parity”• ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0,
ECC_p)• Sustain 2 drive failures with no data loss• Minimum requirement: 4 disks
– 2 for data striping– 2 for dual parity
A1ECC2pECC1q
A0
E1
ECC2qB1B0
Data Disk 0
Data Disk 1
E0C1C0
Data Disk 2
ECC1pECC0p
Data Disk 3
ECC0q
Data Disk 4