High-Bandwidth Memory Interface Design
Chulwoo Kim
[email protected] Dept. of Electrical Engineering Korea University, Seoul, Korea
February 17, 2013
Chulwoo Kim 1 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 2 of 86
Outline
Introduction DRAM 101
Simplified DRAM Architecture and Operation
Differences of DRAM (DDRx, GDDRx, LPDDRx)
Trend
Memory Interface: Differences and Issues
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 3 of 86
D D D D D D D D
CLK
DQ
SDRAM
SDR Single Data Rate
DDR
Double Data Rate
Main Memory DDRx PC, Notebook, Server
Graphics Memory GDDRx Graphic Card, Console
Mobile Memory LPDDRx Phone, Tablet PC
CLK
DQ D
CLK
DQ D D
CLK
Command C CAS* Latency
Burst Length
MCU
SDRAM
DRAM 101
Synchronous Dynamic Random Access Memory
Introduction
CLK &
Command Data
*CAS : Column Address Strobe
Chulwoo Kim 4 of 86
DRAM DDR4 Die Photo
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Bank
0
Bank
1
Bank
2
Bank
3
Bank
8
Bank
9
Bank
10
Bank
11
Bank
4
Bank
5
Bank
6
Bank
7
Bank
12
Bank
13
Bank
14
Bank
15
Supply Voltage VDD=1.2V, VPP=2.5V
Process 38nm CMOS /3-metal
Banks 4-Bank Group, 16 Bank
Data Rate 2400 Mbps
Number of IO‟s X4 / X8
Introduction Chulwoo Kim 5 of 86
Bank
Simplified DRAM Architecture
Bank
Peripheral Circuit
Cell Array
Column Repair Fuse
Write Drv. / Read Amp. Column Decoder
Row
Repair F
use
Row
Decoder
Word
Lin
e D
river
CLK/ADD/CMD Buffer
CMD Controller
DLL
Genera
tor
BLSA*
BLT BLB
WL
ICLK DCLK
DQ TX
Serial to parallel
Parallel to serial
DQ RX
Bank Bank
* BLSA : Bit line sense amplifier
Introduction Chulwoo Kim 6 of 86
Concept of DRAM operation
Bank Bank
Bank Bank
*BLSA : Bit line sense amplifier
*Np: Number of pre-fetch
*Ndq: Number of DQ
Peripheral Circuit
GIO
Ndq bits Ndq bits
WRITE
: Serial to parallel
(DQ GIO)
READ
: Parallel to serial
(GIO DQ)
DQ RX DQ TX
Serial to parallel
Parallel to serial
BLSA BLSA Np×Ndq
Np×Ndq bits
*GIO : Global I/O
Introduction Chulwoo Kim 7 of 86
tCCD*=1
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR1,BL*=2)
0
[2] JEDEC, JESD79F, pp. 24-29
1 0 1
DQS
DQ
CLK
Number of GIO channel=Np×Ndq=2×8=16 (DDR1 x8)
After CL*
* tCCD : CAS to CAS delay * CL : CAS latency
* BL : Burst length
Introduction
BL*=2
Chulwoo Kim 8 of 86
Pre-fetch Diagram(DDR1)
Num. of GIO channel = 2×Ndq
Pre-fetch operation 2-bit pre-fetch
[2×Ndq] data access
(If the output data rate is 400Mbps, the internal data rate is 200Mbps)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction Chulwoo Kim 9 of 86
tCCD=2
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR2,BL=4)
[3] JEDEC, JESD79-2F, pp. 35
0 1 2 3 0 1 2 3
DQS
DQ
CLK
Number of GIO channel=Np×Ndq=4×8=32 (DDR2 x8)
* RL : READ latency
After RL*
Introduction
BL=4
Chulwoo Kim 10 of 86
Pre-fetch Diagram(DDR2)
Num. of GIO channel = 4×Ndq
Pre-fetch operation 4-bit pre-fetch
[4×Ndq] data access
(If the output data rate is 800Mbps, the internal data rate is 200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction Chulwoo Kim 11 of 86
tCCD=4
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR3,BL=8)
[4] JEDEC, JESD79-3F, pp. 62
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
CLK
Number of GIO channel=Np×Ndq=8×8=64 (DDR3 x8)
After RL
Introduction
BL=8
Chulwoo Kim 12 of 86
Pre-fetch Diagram(DDR3)
Num. of GIO channel = 8×Ndq
Pre-fetch operation 8-bit pre-fetch
[8×Ndq] data access
(If the output data rate is 1.6Gbps, the internal data rate is 200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction Chulwoo Kim 13 of 86
[5] JEDEC, JESD79-4, pp. 77-78 [6] T. Y. Oh et al., ISSCC 2010, pp. 434-435
Bank Grouping Timing(DDR4,BL=8)
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
tCCD_S=4 tCCD_L=5
RD G0
RD G1
RD G1
GIO_BG0
GIO_BG1 GIO_BG1
GIO_BG0
GIO_BG1
GIO_BG2
GIO_BG3
CLK
Number of GIO channel=Np×Ndq×Ngroup=8×8×4 = 256(DDR4 x8)
After RL
Introduction
BL=8
Chulwoo Kim 14 of 86
GIO MUX
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Pre-fetch & Bank Grouping(DDR4)
Num. of GIO channel = 8×Ndq
Bank Bank Bank Bank
Bank Bank Bank Bank
Group0 Group1
Group2 Group3
Pre-fetch operation 8-bit pre-fetch
Bank grouping
Introduction Chulwoo Kim 15 of 86
DDRx GDDRx LPDDRx
Architecture
Application PC/Server Graphic card Mobile/Consumer
Socket DIMM On board MCP*/PoP*/SiP*
IO ×4/×8 ×16/×32 ×16/×32
Unique Function
Single uni-directional WDQS, RDQS VDDQ termination CRC, DBI ABI
No DLL DPD* PASR* TCSR*
Differences of DDRx,GDDRx,LPDDRx
Bank
PAD
Bank
Bank Bank PAD
Bank Bank
Bank Bank
PAD Bank
PAD
Bank
Bank Bank
* MCP: Multi chip package * PoP : Package on package * SiP : System in package
* DPD: Deep power down * PASR : Partial array self refresh * TCSR : Temperature compensated self refresh
Introduction Chulwoo Kim 16 of 86
DDR Comparison
DDR1 DDR2 DDR3 DDR4
VDD [V] 2.5 1.8 1.5 1.2
Data Rate [bps/pin]
200M~400M 400M~800M 800M~2.1G 1.6G~3.2G
Pre-Fetch 2 bit 4 bit 8 bit 8 bit
STROBE Single DQS Differential DQS, DQSB
Interface SSTL_2 SSTL_18 SSTL_15 POD_12
New Feature
OCD calibration ODT
Dynamic ODT ZQ calibration Write leveling
CA parity DBI*, CRC* Gear down CAL* ▪ PDA* FGREF * ▪ TCAR* Bank grouping
* DBI: Data bus inversion * CRC: Cyclic redundancy check * CAL: Command address latency
* PDA: Per DRAM addressability * FGREF: Fine granularity refresh * TCAR: Temperature controlled array refresh
Introduction Chulwoo Kim 17 of 86
GDDR Comparison
GDDR1 gDDR2 GDDR3 GDDR4 GDDR5
VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35
Data Rate [bps/pin]
300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G
Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit
STROBE Single DQS Differential Bi-direction
DQS*, DQSB Single Uni-direction WDQS, RDQS
Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15
New Feature
OCD* calibration ODT*
ZQ DBI Parity(opt)
No DLL PLL(option) WCK, WCKB
CRC ▪ ABI* RDQS(option) Bank grouping
* DQS: DQ strobe signal, DQ is dada I/O Pin * OCD: Off chip driver
* ODT: On die termination * ABI: Address bus inversion
Introduction Chulwoo Kim 18 of 86
LPDDR Comparison
LPDDR1 LPDDR2 LPDDR3
VDD [V] 1.8 1.2 1.2
Data Rate [bps/pin]
200M~400M 200M~1066M 333M~1600M
Pre-Fetch 2 bit 4 bit 8 bit
STROBE DQS DQS_T, DQS_C DQS_T, DQS_C
Interface SSTL_18* HSUL_12* HSUL_12*
DLL X X X
New Feature
CA pin ODT (High tapped termination)
* SSTL: Stub series terminated logic * HSUL: High speed un-terminated logic
Introduction Chulwoo Kim 19 of 86
Trend
2.5
1.5
1.8
0.2 0.4 0.8 1.2 1.6 2.0
1.2
2.4
DDR1
GDDR1
7.0
Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing
DDR2 GDDR3
DDR4 LPDDR2
LPDDR3
… 2.8 3.2 3.6
VD
D [
V]
Data Rate [Gbps]
LPDDR1
DDR3
gDDR2
GDDR4 GDDR5
Introduction Chulwoo Kim 20 of 86
Memory Interface
System Feature Single-ended/high speed
Many channel (weak for coupling effect)
DDR: multi-drop (multi rank, multi DIMM) GDDR: point to point
Impedance discontinuities (stubs, connector, via, etc. )
Issue Reflection
Inter-symbol interference
Simultaneous switching output noise
Pin to pin skew
Poor transistor performance
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
CPU
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
GPU DRAM DRAM
DRAM DRAM
DRAM
DRAM
Introduction Chulwoo Kim 21 of 86
Outline
Introduction
Clock Generation and Distribution Delay-locked loop (DLL)
Duty cycle corrector (DCC)
Clock distribution
Transceiver Design
TSV
Conclusions
References
Chulwoo Kim 22 of 86
Basic DLL Architecture
Variable
Delay Line Replica Delay
Controller PD
DRAM External
Clock
Data
tD1 tDREP tDVDL
I_CLK
FB_CLK
O_CLK
I_CLK
FB_CLK
O_CLK
Clock
Data
tD2
DATA from memory core
Clock Generation and Distribution
tD1
tD2
tDREP
tCK ∙ N = tDVDL +tDREP
tDREP ≈ tD1 +tD2
tCK ∙ N = tDVDL +tD1 +tD2 + γ
γ = tDREP – (tD1 +tD2)
tDVDL
Chulwoo Kim 23 of 86
Replica Delay Mismatch
Valid
Data
Window
tCK
tDQSCK* (or tAC)
Long
Short
VD
D H
VD
D
LVD
D
tDQSCK (or tAC) tDQSCK (or tAC) V
DD
H
VD
D
LVD
D
Valid
Data
Window
Valid
Data
Window
γ variation [ps]
Supply Voltage [V]
*tDQSCK (or tAC) – DQS output access time for CK/CKb
Clock Generation and Distribution
γ ≈0
γ >0
γ
Locking Range Considerations
[7] H.-W. Lee et al., submitted to TVLSI
tCK
tDQSCK (or tAC)
Bird’s beak
I_CLK
I_CLK
FB_CLK
FB_CLK
tDINIT+tDREP tDREQUIRED
Clock Generation and Distribution
tDINIT+tDREP tDREQUIRED
tDINIT = tDVDL(0) + tDREP
Chulwoo Kim 25 of 86
Short
Long
N×tCK > tDVDL(0) + tDREP
tCK = tDVDL + tDREP + t∆
Delay Measure Delay Line
Replicate Delay Line
Clock
OUT
tD1
tD2
tD1+tD2 tD3
Synchronous Mirror Delay (SMD)
Basic Operation
Measure and replicate the delay
No feedback
Match delay in two cycles
tD1
tD1+tD2
tD3 tD3 tD2
OUT
I_CLK
Clock
Replicate Measure
Replica
Delay
[8] T. Saeki et al., ISSCC 1996, pp. 374-375
Clock Generation and Distribution
I_CLK
Chulwoo Kim 26 of 86
Disadvantages of SMD
Disadvantages
Mismatch between replica delay and input buffer & clock distribution
Coarse resolution
Input jitter multiplication
Delay Measure Delay Line
Replicate Delay Line
Clock
OUT
tD1
tD2
tD1+tD2 tD3 Clock
Clock
w/o jitter
w/ jitter
tD1
tD1+tD2
tCK-(tD1+tD2) tD2
OUT
tCK-(tD1+tD2)+2Δ
-Δ +Δ
OUTInput pk-pk
jitter(±Δ) Output pk-pk
jitter(±2Δ)
tCK-(tD1+tD2)+2Δ
tCK
tD1
tD1+tD2
tD2
+2Δ
Clock Generation and Distribution
I_CLK
Chulwoo Kim 27 of 86
Register Controlled DLL
Locking information is stored digitally in register
Vernier type delay line increases resolution
[9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73
tD+Δ tD+Δ tD+Δ tD+Δ
tD tD tD tD tD
SW0 SW1 SW2 SW3 SW4
IN
OUT
tD+Δ
tD
fan-out=2
fan-out=1
SW(n-1) SW(n)
Sub Delay Line
Main Delay Line
Sub Delay Line
Main Delay Line
Clock Generation and Distribution Chulwoo Kim 28 of 86
Single Register Controlled Delay Line
Clock Generation and Distribution
Fine Delay
Controller
I_CLK CSL1 CSL2 CSL3
IN1
IN2
OUT12 Phase Mixer
1-K
K
IN1
IN2
OUT12
OUT1
OUT2
OUT12
OUT1
IN2
IN1
OUT2
tUD
tUD
Coarse Delay
UP/DN* from PD
*DN=Down
Chulwoo Kim 29 of 86
Boundary Switching Problem
IN1×(1-K)+IN2×K
I_CLK
Shift left
Passing through 4 UDCs
IN1
IN2
OUT12 Phase Mixer
UDC*
Passing through 3 UDCs
Clock Generation and Distribution
tUD
IN1 K=0
IN2 K=1
tUD
IN1 K=0
IN2 K=1
K=0.9
K=0.9
Coarse shift & fine reset do not occur simultaneously
Chulwoo Kim 30 of 86
*UDC=Unit delay cell
Seamless Boundary Switching
Clock
Shift left
Unit Delay Cell IN1×(1-K)+IN2×K
Dual Coarse Delay Line
tUD
K(0≤K≤1)
IN1 K=0
IN2 K=1
IN1
IN2
Phase Mixer
OUT12
Clock Generation and Distribution
K=0.9
[10] J.-T. Kwak et al., VLSI 2003, pp. 283-284
tUD
IN2 K=1
IN1 K=0
K=1.0
Fine set first and then coarse shift
Chulwoo Kim 31 of 86
Adaptive Bandwidth DLL w/ SDVS*
Variable
Delay Line Replica Delay
Controller PD
I_CLK
FB_CLK
Update Period Pulse Gen.
O_CLK To Upper Block NCODE
I_CLK
Update Pulse
FB_CLK
Update Period m×tCK-tDREP+tDREP=m×tCK m=2,BWDLL=1/(2×tCK)
[11] H.-W. Lee et al., ISSCC 2011, pp. 502-504
Clock Generation and Distribution
6
8
10
12
14
16
18
DN BASE UP
15.9 ps
10.2 ps
7.8 ps
6
10
14
18
Low -Speed Mode
High -Speed Mode
Base
[ps]
Fine Unit Delay vs. Mode
Update Pulse
*SDVS: Self-dynamic voltage scaling
Chulwoo Kim 32 of 86
Duty Cycle Corrector (DCC)
DCC
Reduces duty cycle error
Enlarges valid data window for DDR
Needs to correct ±15% duty error at max speed
Can be implemented either in analog or digital type
DCC Design Issues
Location of DCC (before/after DLL)
Embedded in DLL or not
Power consumption
Area
Operating frequency range
Locking time in case of digital DCC
Offset of duty cycle detector
Clock Generation and Distribution Chulwoo Kim 33 of 86
Digital DCC
Invert-Delay Clock
Generator
IN
Out Phase Mixer
Pulse Width Controller
Duty Cycle Detector
Half-Cycle Delayed Clock
Generator
Edge
Combiner
Out
Out
Invert and delay
50% 50%
50% 50%
OUT
IN
IN
OUT
IN
OUT
HD_IN
IN
IN
IN
HD_IN
IN
50% 50%
Clock Generation and Distribution Chulwoo Kim 34 of 86
DCC in GDDR5
Clock Generation and Distribution
RX
Div
ide
r
CML2
CMOS
DQPLL sel.
CML only
Duty Cycle
Detector
Adder-based
Counter
Duty Cycle
Corrector
Control Pulse
Generator
4-phase
4
PLL
Glo
ba
l
Drive
r
Re
pe
ate
r
Duty
Cycle
Adjuster
up/dns
c
4
rxclk rxclkb
sw hclk & lclk
4 44
DQ
Clk Distribution
clock
Network
Decreasing
CML_bias
WCK WCKb
X1X2X4X8 X1 X2 X4 X8
c
Duty-Cycle
RX
rxclk
rxclk
rxclk
b
Decoder
rxclkb
Adjuster
duty-cycle
(DCA) DCA is not in clock path
No jitter addition
[12] D. Shin et al., VLSI 2009, pp. 138-139
Chulwoo Kim 35 of 86
DLL-related Parameters & Reference
DDR1 DDR2 VDD
Lock time
Max. tDQSCK
200 cycles 200 cycles
333MHz~ 800MHz
600MHz~ 1.37GHz
2~20K cycles
2.5V
600ps
166MHz
1×tCK
1.8V 1.5V/1.35V 1.8V 1.5V
Nominal speed
tXPDLL*(tXARD)
Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns
300ps 225ps 180ps 140ps
333MHz 1.6GHz
512 cycles 2~5K cycles
DDR3/DDR3L GDDR3 GDDR4
2×tCK 10×tCK 7×tCK+tIS 9×tCK+tIS
RELATED AREA
DCC block
Variable Delay Line
Delay Control Logic
Replica
Low Jitter
REFERENCE Type
[23]** [14] [18] [19]** [20] [22] [24] [25]* [26]
[23]* [26] [13] [15]** [16] [18] [20] [21]** [31] [32]* [33]**
[27][28]** [29] [30]
[29] [30]** [34]* [35]*
[32] [27] [28]** [30]**
[14] [36]* [15]** [16] [32]* [24] [26] [27] [17]** [19]**
[14] [25]* [28]**
tXPDLL*(tXARD) – Timing for exit precharge power-down to any non-READ command
Clock Generation and Distribution
[ ] digital
[ ]* mixed
[ ]** analog
[13] [14] [15]** [16] [17]** [19]** [20] [21]** [18]
Chulwoo Kim 36 of 86
Clock Distribution
DQ DQ DQ DQ DQ DQ DQ
DQ DQ DQ DQ DQ DQ DQ DQ
Global Clock Buffer
CK/CKB DQ
Clock Distribution Issues Clock skew among DQs
Low power
Robust under PVT variations
CML to CMOS converter jitter
[37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500
1,2
00μm
93,750μm
Clock Generation and Distribution Chulwoo Kim 37 of 86
CML to CMOS Converter
Global Clock Buffer
Current logic mode : high-speed clock
CML to CMOS Converter Issue
Susceptible to noise
Jitter
CLKP CLKN
OUTNOUTP
Global Clock Buffer CML to CMOS Converter
1700μm DQ
CLKP CLKN
CLKOUT
Clock Generation and Distribution Chulwoo Kim 38 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design Channel
Pre-emphasis
Equalizer
Crosstalk and skew
Training
Input buffer
Output driver
DBI/CRC
TSV Interface for DRAM
Summary
References
Output driver
Training
Pre-emphasis
DBI/CRC
Input buffer
Training
Equalizer
DBI/CRC
CH
Chulwoo Kim 39 of 86
Channel Characteristics
GDDRx
Point to point connection
Performance target
• High data rate
Few reflection components
• PCB VIAS
DDRx
Multidrop
Performance and power
Many reflection components
• PCB VIAS, DIMM connector….
GPU
GDDRx
GD
DR
x
DIM
M S
lot
CPU Socket
Transceiver Design Chulwoo Kim 40 of 86
Emphasis for Channel Compensation
Time
Channel
Original Signal Distorted Signal
D(in) FFE D(out)
FFE
Am
plitu
de
Am
plitu
de
Am
plitu
de
Channel FFEChannel
Freq.fdata/2 Freq. Freq.fdata/2 fdata/2
Am
plitu
de
Time
Am
plitu
de
Channel
Transceiver Design Chulwoo Kim 41 of 86
Pre-emphasis vs. De-emphasis
Pre-emphasis : Transition Bit Boosting
De-emphasis : Non-transition Bit Suppression
1-tap pre-emphasis
No emphasis
1-tap de-emphasis
Va
Va
Va
Time
Transceiver Design Chulwoo Kim 42 of 86
Basic De-emphasis Circuit
The Number of Taps
Depends on the channel quality and bit rate
Usually from one to three taps
D Q
QB
Din
DoutK0
Unit delay
-K1
X(n)
Y(n)
Transceiver Design Chulwoo Kim 43 of 86
Pre-emphasis Circuit[1/2]
Cascaded Pre-emphasis Internal node ISI due to limited TR performance at high speed
Internal node pre-emphasis ratio would not be affected by the channel
Less sensitive to the system environment or channel variations
[38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134
Din(n-1)
Din(n-2)
Driver
Pre-emph.
DQ
DQB
Din(n)
4:2
4:2
4:2
2:1
2:1
2:1
2:1
NoPre-emphasis
ConventionalPre-emphasis
ProposedPre-emphasis
4000Time[psec]
1.04
1.20
1.08
1.201.00
1.20
Vo
ltag
e[V
]
Transceiver Design Chulwoo Kim 44 of 86
Pre-emphasis Circuit[2/2]
[39] H. Partovi et al., ISSCC, 2009, pp.136-137
Voltage Mode Driver Pre-emphasis Additional zero by Cc
Time continuous pre-emphasis
Pre
-Dri
ver
MainDriverP
re-D
river
RT
RTDin
RC
CCRC
CP
Dout
TX
Pre-Emph. Driver
Boosting Capacitor
CL
RT
GPU
BW
BW
CHDin
RC
RT
CC
Dout
CL
Equivalent Linear Model
CP RT
Transceiver Design Chulwoo Kim 45 of 86
DFE cancels ISI without noise amplification
Clock must be provided by DLL or PLL
Critical path (feedback path) is important
(A) (B) (C) (D)
Decision Feedback Equalization (DFE)
Time
Am
plitu
de
1UI
Time
Am
plitu
de
ISI
Time
Am
plitu
de
Emulated ISI
Time
Am
plitu
de
No ISI
Transceiver Design
[40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010
Chulwoo Kim 46 of 86
[41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279
The previously captured data must be fed back to the receiver within 1UI
WCK/2_0
DQ Vref
WCK/2_0
P0b P0
WCK/2_0
P270b P270
WCK/2_0
× α × α
× α
DFE SADQ
DFE SA
Vref
WCK/2_0
WCK/2_90
DFE SA
DFE SA
WCK/2_180
WCK/2_270
SR Latch
SR Latch
SR Latch
SR LatchP270
P180
P90
P0 D0
D270
D180
D90
DQ
WCK/2_270
P270
WCK/2_0
P0
Precharge Evaluation
Precharge Evaluation
D270 D0 D90
TFB=TSA
Crosstalk is coupling of energy from one line to another
Crosstalk
Timing Effect
Timing Jitter
Signal Integrity
Near end crosstalk Far end crosstalk
Input signal
Input signal at far end
Near Far
Cm
Near Far
Lm
ICm ILm
Inear=ICm+ILm Ifar=ICm─ILm
Transceiver Design Chulwoo Kim 48 of 86
Staggered Memory Bus
No discrepancy of propagation delay due to the crosstalk
Difference of transition point is τ/2
Distance between channels with the same transition is increased
Jitter due to coupling from the adjacent channel is reduced
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
MCU DRAM
Staggered Memory Bus
Channel
Channel
τ
Transceiver Design Chulwoo Kim 49 of 86
Compensation for glitch by adding or subtracting current
Rise : ICOMP is added to the main driver
Fall : ICOMP is subtracted from the main driver
Glitch Canceller
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
Transceiver Design
TX1
Transition Detector
DTX3
TX3
TX2
IBIAS+ICOMP
DTX1 DTX2
Rise/Fall
Aggressor
Victim
DTX1
Rise
Fall
DTX2
Chulwoo Kim 50 of 86
Crosstalk equalization at transmitter
Cancel the crosstalk by the impedance calibration
Crosstalk Equalizer (TX)
[37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500
DO[0]
DO[1:3]
DQ[0]
EN[0:5] DO[0]
∆t
DO[1]
DQ[0]
Crosstalk Equalizing Driver
EN[1]
EN[0] EN[1]
EN[0]
Transceiver Design Chulwoo Kim 51 of 86
Skew
Differences of flight time between signals
Skew can cause timing errors
Key design criterion in high-speed systems
Transceiver Design
MCU/GPU DRAM
Ban
k
Ban
k
Perip
heral C
ircu
it
DLL
CM
D
Co
ntr
olle
r
Seria
l . P
aralle
l
Generator
TD
TD‟‟
CLK
Command
DQS
DQ
Address TD‟
Chulwoo Kim 52 of 86
Pre/De-skew with Preamble Signal
Skew cancellation circuit is put in each DRAM
With estimated skew information
De-skew the data during write mode
Pre-skew the data during read mode [43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657
Data Delay Lines PLL Mux
Register Files
Skew Estimator
Skewed Data Data
Ext.Clk
Data[n] Skew
De-skewed Data
Sampling Clk
8
8
3
8
3 8
Transceiver Design Chulwoo Kim 53 of 86
Fly-by Topology for DDR3
[4] JEDEC, JESD79-3E, pp. 56-59
Fly-by Topology Better signal integrity to reduce
the number of stubs and stub length
Easy to apply a single termination at the end of signal
DQ and DQS are applied to each DRAM at the same time
Large skew bw. CLK and DQS
Need to calibrate skew
DRAM#1
DRAM#2
DRAM#7
DRAM#8
T-branch
CLK, CMD, Address
DRAM#1
DRAM#2
DRAM#7
DRAM#8
CLK, CMD, Address
Skew
[s]
DRAM#1
DRAM#2
Skew
[s]
DRAM#3
DRAM#4
DRAM#5
DRAM#6
DRAM#7
DRAM#8
DRAM#1
DRAM#2
DRAM#3
DRAM#4
DRAM#5
DRAM#6
DRAM#7
DRAM#8
DQ & DQS
Fly-by
DQ & DQS
VTT
T-branch Topology CLK/CMD/Address are applied to
each DRAM in parallel
Small skew bw. CLK and DQS
Transceiver Design Chulwoo Kim 54 of 86
Write Leveling for DDR3
Write Leveling
Timing mismatch compensation between CLK and DQS
Write leveling is applied to all DRAMs, respectively
[4] JEDEC, JESD79-3F, pp. 56-59
T0 T1 T2 T3 T4 T5 T6 T7
T0 T1 T2 T3 T4 T5 T6Tn
CK#
CK
diff_DQS
CK#
CK
diff_DQS
DQ
DQ
diff_DQS
Source
Destination
Push DQS to capture 0-1 transition
0 or 1
0 or 1
0 0 0
1 1 1
Transceiver Design Chulwoo Kim 55 of 86
Training for GDDR5
Adaptive Interface Training
Ensure the Widest Timing Margins for All Signals
Controlled by MCU
[44] W. Hubert et al., ATS, 2008, pp. 24-27
CK
CMD
ADDR
WCK
DQ
GDDR5 Timing after Training
Transceiver Design Chulwoo Kim 56 of 86
Training Sequence for GDDR5
Optional
Optimize address input data eye
Clock alignment
Ready for read/write
Search for best read data eye
Detect burst boundaries of read stream
Search for best write data eye
Detect burst boundaries of write stream
[45] JEDEC, JESD212, pp. 23-39
Detect the configuration and mirror function
ODT setting
Transceiver Design
Power Up
Address Training
WCK2CK Alignment Training
READ Training
WRITE Training
Exit
Chulwoo Kim 57 of 86
Training Example : Write Training
[44] W. Hubert et al., ATS, 2008, pp. 24-27
t0 + t1
Memory Controller GDDR5 Device
Write Data eyes
t1 t2
Memory Controller GDDR5 Device
WriteData eyes Data eyes
t1t2
t0
t0
t0
t0
Data eyes
t0 - t2
Transceiver Design Chulwoo Kim 58 of 86
Input Buffer
Convert attenuated external signal to rail-to-rail signal
Trade-off between high speed operation and power consumption
Transceiver Design
DRAM MCU/GPU
DQS
Ban
k
Ban
k CLK
Command
DQ
Perip
heral C
ircu
it
DLL
CM
D
Co
ntr
olle
r
Seria
l . P
aralle
l
GEN
4
n
Address
m*
* m: The number of address channels which are depend on kinds of memory or its density
Chulwoo Kim 59 of 86
Input Buffer Comparison
CMOS Type
Simple circuit Low-speed input (CKE)
Susceptible to noise
Unstable threshold
Differential Type
Complex circuit High-speed input
Robust to noise
Stable threshold
Commonly used
In OUT
En
En
OUT
En En
InVref
En
Transceiver Design Chulwoo Kim 60 of 86
DDR4 Input Buffer
[46] K. Sohn et al., ISSCC, 2012, pp. 38-40
Gain Enhanced Buffer
Signal transition detector is added The bias level (I) is controlled
Sensitivity can be enhanced at higher frequencies
Wide Common-Mode Range DQ Buffer
Delivers stable inputs to the second stage Amp.
Feedback network reduces the output common-mode variation
Vref In
CMFB
Amp.
In
Vref
InBuffer
TransitionDetector
I
* CMFB : Common-mode feedback
Transceiver Design Chulwoo Kim 61 of 86
Pseudo Open Drain (POD)
Impedance Calibration
Manual vs. Automatic
External Resistor
240Ω
Din
Din
Pull-UP
Pull-DOWN
Din
Din
I/OBuffer
Channel
240Ω
Transceiver Design
Chulwoo Kim 62 of 86
Impedance Calibration
Thermometer Code Control
PU PU
REG
PD
REG
DRAMExternal
PUcon
PDcon
Vref
En
En
ZQPAD
Dout
n
n
WP
R
WN
R
WP
R
WN
R
WP
R
WN
R
Din+
PUcon
Din+
PDcon
[47] C. Park et al., JSSC, Apr. 2006, pp. 831-838
Transceiver Design
Chulwoo Kim 63 of 86
Multi Slew-rate Output Driver
Binary-weighted Code Control
PU PU
DF
PD
DF
DRAMExternal
PUcon
PDcon
Vref
En
En
DF = Digital LPF + UP/DOWN Counter
ZQPAD
Dout
WP/4 WP/2 WP 32WP
128R 64R 32R R
WN/4 WN/2 WN 32WN
128R 64R 32R R
60Ω
120Ω
240Ω
n
n
Din+
PUcon
Din+
PDcon
[48] D. U. Lee et al., ISSCC, 2008, pp. 280-613
Transceiver Design
Chulwoo Kim 64 of 86
Global ZQ Calibration
Global Impedance Mismatch Error < 1%
PVT variation sensor
LS
PA
CP
LO
Ref.
ZZcal
i0cal
(-)
i0cal
OD
T c
alib
ratio
n
blo
ck a
t ZQ
pin
Zcal
DQ0ZQ
LS
PA
CP
LO
Ref.
CP: ComparatorPA: Pre-amplifierLS: Local PVT sensorLO: Local controller
i0cal
DQn (n=1~31) Z
Global Reference Signal
[49] J. Koo et al., CICC, 2009, pp. 717-720
Transceiver Design
Chulwoo Kim 65 of 86
Data Bus Inversion (DBI)
Power reduction technique independent of data pattern
Dominant power (I/O Buffer)
P=α X CPCB X VDD2 α < 0.5
For high-BW memory, inversion time +CRC can be a bottle neck
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver Design Chulwoo Kim 66 of 86
Cyclic Redundancy Check (CRC)
Data error check for every unit interval (64 bits – data only) Redundancy bit : 1 bit/byte
Speed bottleneck for high-BW
Time (READ DBI + READ CRC + CRC calculator) < 9 periods
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver Design
Error type Detection rate
random single bit 100%
random double bit 100%
random odd count 100%
burst ≤ 8 100%
Chulwoo Kim 67 of 86
CRC (cont’d)
X8+X2+X1+1 with an initial value of „0‟ Algorithm for GDDR5 ATM-0M83
Logic for algorithm takes a long time
To increase CRC speed XOR logic optimization
CRC calculation time < TCRC
Transceiver Design Chulwoo Kim 68 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM Bandwidth requirement
DRAM with TSV
TSV DRAM type
DRAM stacking type
Data confliction issue & solution
Failed TSV issue & solution
Summary
References
Chulwoo Kim 69 of 86
Bandwidth Requirements
Requirement Next GDDR will require over 10Gb/s/pin data rate
Restrictions Very difficult over 10Gb/s/pin
Cost for performance improvements
Power consumption
2000 2005 2010 20150
2
4
6
8
10
12
DDR
DDR2
DDR3
DDR4
GDDR3
GDDR4
GDDR5
Data
Rate
/P
in [
Gb
ps]
DDRx / GDDRx Data Rate/Pin Trend
?Gb/s/pinGb/s/chip
GDDR1 32 1
GDDR3 51.2 1.6
GDDR4 102.4 3.2
GDDR5 224 7
GDDR? 448 (?) 14 (?)
TSV Interface for DRAM Chulwoo Kim 70 of 86
DRAM with TSV
Advantages of DRAM with TSV Higher density per area
Shorter interconnection : lower power, faster flight time
Higher bandwidth with wide I/O
Wide I/O easily achieves 448 Gb/s/chip at next GDDR
(Example : 800 Mb/s/pin ×512 I/O ≈ 448 Gb/s/chip)
MCU/GPU
Wide I/O Memory
TSV
MCU/GPU
Memory
Memory
Memory
Memory Interposer
TSV Interface for DRAM Chulwoo Kim 71 of 86
TSV DRAM Type
Type Main Memory Mobile Graphics
Architecture
No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA
Feature • Low power • High speed
• Low power • Multi channel • Wide I/O
• Max bandwidth • Multi channel
Package
GPU
Controller Interposer
TSV Interface for DRAM Chulwoo Kim 72 of 86
Stacking Type
Type Homogeneous Heterogeneous
Architecture
Feature • Same chips • Low cost
• Slave : only cells • Master : with peripheral
Slave
Slave
Slave
Master
TSV Interface for DRAM Chulwoo Kim 73 of 86
Data Confliction Issue
PVT variations cause the data skew
Data Confliction increases the short current
DQ DQ DQ DQ DQ DQ
DQ DQ DQ DQ
Data Confliction
Slowest Chip Fastest Chip
PVT Variations
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAM
DQ of
CHIP 0
MN0
MP0
EN0
/EN0
MN3
MP3
EN3
/EN3
DQ of
CHIP 3
HIGH
LOW
DQ
Pin
TS
V
Chulwoo Kim 74 of 86
Rank 0
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank Rank 1
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank Rank 2
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
Separate Data Bus per Group
Separate Data Bus per Bank Group Less dependent on the PVT variation
Rank 3
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAM Chulwoo Kim 75 of 86
DLL-Based Self-Aligner
Data alignment to external clock or clock of the slowest chip [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAM Chulwoo Kim
SkewDetector
SkewCompensator
Fine Aligner
Replica
UP/DN
TS
V M
od
el
READ
READb
REAL PATH0
1
0
1
CK
TRCLK
RFBCLK
C_CLK
CLKOUT
CHIP 1
CHIP 2
CHIP 3
CHIP 0
MODE
TFBCLK
PINDQS or
Dummy PinTSV model
Pipe
latches
Pipe
latchesLatches
Datas Aligned Datas
SAM MODE
PD1
PD2
76 of 86
Failed TSV Issue
a. TSV plating defect b. pinch-off
Decreasing the assembly yield
Increasing the total cost
Failed TSV
[53] D. Malta et al., ECTC, 2010, pp. 1779-1775
TSV Interface for DRAM Chulwoo Kim 77 of 86
TSV Check
A TSV connectivity check by using the internal circuit
Test Signal Generating Circuits
Scan Chain Based Testing Circuits TSV_0
TSV_1
TSV_2
TSV_3
TSV_4
In_0 In_1 In_2 In_3 In_4
Out_0 Out_1 Out_2 Out_3 Out_4
Receiver End
Sender End
[54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722
TSV Interface for DRAM Chulwoo Kim 78 of 86
Redundant TSVs for Failed TSV Conventional : redundant TSVs are dedicated and fixed
Proposed : failed TSV is repaired with a neighboring TSV
TSV Repair
Chip1
Conventional
Chip2
A
B
C
D
A‟
B‟
C‟
D‟
a
b
r2
r1
c
d
Chip1
Proposed
Chip2
B
C
D
A‟
B‟
C‟
D‟
a
b
c
d
e
f
A
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAM Chulwoo Kim 79 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 80 of 86
Summary
Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing
For synchronization of external clock and output of DRAM, low power, small area, and low skew are important design parameters
To achieve high-BW memory, many design techniques have been and will be adopted from other high-speed wireline transceivers
TSV interface for DRAM might be a good solution to achieve high bandwidth and low power
Summary Chulwoo Kim 81 of 86
Suggested Papers to See
17.1 “A 6.4Gb/s near-ground single-ended transceiver for dual-rank DIMM memory interface systems”
17.2 “A 27% reduction in transceiver power for single-ended point-to-point DRAM interface with the termination resistance of 4×Z0 at both TX and RX”
17.3 “A 5.7mW/Gb/s 24-to-240Ω 1.6Gb/s thin-oxide DDR transmitter with 1.9-to-7.6V/ns clock-feathering slew-rate control in 22nm CMOS”
17.4 “An adaptive-bandwidth PLL for avoiding noise interference and DFE-less fast precharge sampling for over 10Gb/s/pin graphics DRAM interface”
Chulwoo Kim 82 of 86
References [1] K. Koo et al., “A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and ×4 half-page architecture”, in IEEE ISSCC Dig. Tech. Papers, pp. 40–41, 2012.
[2] JEDEC, JESD79F.
[3] JEDEC, JESD79-2F.
[4] JEDEC, JESD79-3F.
[5] JEDEC, JESD79-4.
[6] T.-Y. Oh et al., “A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-group restriction”, in IEEE ISSCC Dig. Tech. Papers, pp. 434–435, 2010.
[7] H.-W. Lee et al., “Survey and analysis of delay-locked loops used in DRAM interfaces”, submitted to IEEE Trans. VLSI Syst.
[8] T. Saeki et al., “A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay”, in IEEE ISSCC Dig. Tech. Papers, pp. 374-375, 1996.
[9] A. Hatakeyama et al., “A 256 Mb SDRAM using a register-controlled digital DLL”, in IEEE ISSCC Dig. Tech. Papers, pp. 72-73, 1997.
[10] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 283-284, 2003.
[11] H.-W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology”, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011.
[12] D. Shin et al., “Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection scheme for 54nm 7Gb/s GDDR5 DRAM interface”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 138-139, 2009.
[13] W.-J. Yun et al., “A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nm CMOS technology,” IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011.
[14] H.–W. Lee et al., “A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAM interface,” in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010.
[15] B.-G. Kim et al., “A DLL with jitter reduction techniques and quadrature phase generation for DRAM interfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.
References Chulwoo Kim 83 of 86
References [16] W.–J. Yun et al., “A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and update gear circuit for DRAM in 66nm CMOS Technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008.
[17] S. Kim et al., “A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low
stand-by power DDR I/O interface” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 285-286, 2003.
[18] T. Matano et al., “A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003.
[19] K.-H. Kim et al., “Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2 synchronous DRAM application” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 287-288, 2003.
[20] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAM” in IEEE Symp. VLSI Circuits Dig. Tech. Papers , pp. 283-284, 2003.
[21] O. Okuda et al., “A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for SDRAMs]” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 37-38, 2001.
[22] F. Lin et al., “A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5 Gb/s/pin GDDR4 SDRAM,” IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008.
[23] K.-W. Kim et al., “A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase input strobing, and low-jitter fully analog DLL,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007.
[24] D.–U. Lee et al., “A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual-loop digital DLL,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006.
[25] S.–J. Bae et al., “A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration of equalization skew and offset coefficients,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521, 2005.
[26] Y.-J. Jeon et al., “A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs,” IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092, Nov. 2004.
[27] T. Hamamoto et al., “A 667-Mb/s operating digital DLL architecture for 512-Mb DDR,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.
References Chulwoo Kim 84 of 86
References [28] S. Kim et al., “A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002.
[29] J.–B. Lee et al., “Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM,” in IEEE ISSCC Dig. Tech. Papers, pp. 68-69, 2001.
[30] S. Kuge et al., “A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,”
IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000.
[31] H.–W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology,” IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012.
[32] Y. K. Kim et al., “A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bang jitter reduced DLL scheme” in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007.
[33] K.–H. Kim et al., “A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high-speed DRAM application,” in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004.
[34] J.–H. Lee et al., “A 330 MHz low-jitter and fast-locking direct skew compensation DLL,” in IEEE ISSCC Dig. Tech. Papers, pp. 352-353, 2000.
[35] J. Kim et al., “A low-jitter mixed-mode DLL for high-speed DRAM applications,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1430-1436, Oct. 2000.
[36] H.–W. Lee et al., “A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power-noise management with unregulated power supply in 54nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2009, pp. 140-141.
[37] S.-J. Bae et al., “A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable clock-Tracing BW,” in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011.
[38] K.-h. Kim et al., “A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre-emphasis transmitter,” IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006.
[39] H. Partovi et al., “Single-ended transceiver design techniques for 5.33Gb/s graphics applications,” in IEEE ISSCC Dig. Tech. Papers, pp. 136-137, 2009.
[40] Y. Hidaka, “Sign-based-Zero-Forcing Adaptive Equalizer Control,” in CMOS Emerging Technologies Workshop, May 2010.
References Chulwoo Kim 85 of 86
References [41] S.-J. Bae et al., “A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-reduction techniques,” in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008.
[42] K.-I. Oh et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009.
[43] S. H. Wang et al., “A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,”
IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001.
[44] W. Hubert et al., “GDDR5 training-challenges and solution for ATE-based test,” in Asian Test Symposium, pp. 24-27, Nov. 2008.
[45] JEDEC, JESD212.
[46] K. Sohn et al., “A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme,” in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012.
[47] C. Park et al., “A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.
[48] D. Lee et al., “Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm 3.0Gb/s/pin DRAM interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008.
[49] J. Koo et al., “Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5 application,” in Proc. IEEE CICC, pp. 717-720, Sep. 2009.
[50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid-State Circuits Conference, pp. 249-252, 2008
[51] H.-W. Lee et al., “A 283.2μW 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV) interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012.
[52] U. Kang et al., “8Gb 3D DDR3 DRAM using through-silicon-via technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 130-131, 2009.
[53] D. Malta et al., “Integrated process for defect-free copper plating and chemical-mechanical polishing of through-silicon vias for 3D interconnects,” in ECTC, pp. 1769-1775, 2010.
[54] A.-C. Hsieh et al., “TSV redundancy: architecture and design issues in 3-D IC,” IEEE Trans. VLSI Systems, pp. 711-722, Apr. 2012.
References Chulwoo Kim 86 of 86
Top Related