A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM...
Transcript of A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM...
A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM Hierarchical Output Buffer
Kangmin Lee*, Se-Joong Lee, and Hoi-Jun Yoo
Semiconductor System LaboratoryDepartment of EECS
Korea Advanced Institute of Science and Technology
ESSCIRC 2003
2
Outline
• Introduction & Motivation• Hierarchical Output Buffering Technique• Simulation Results• Implementation• Measurement Result• Conclusion
ESSCIRC 2003
3
Introduction• Simplified Router System
Bottleneck 90Gbps FIFO buffer
10Gbps
P/S
P/S
P/S
8x8 Shared Bus SwitchInput Port
NPRX
NPRX
NPRX NP T
X
NP TX
NP TX
Output Port1GHz x 10b
10Gbps
S/P
S/P
S/P
512b
Sha
red
Bus
160MHz x 512b
AF
AF
AF
FIFO
FIFO
FIFO
20MHz x 512b
20Mz x 512b 1GHz x 10b
BW=80Gbps 10Gbps
1
2
8
ESSCIRC 2003
4
Motivation
• FIFO Requirements– Max. Input BW: 80Gbps– Max. Output BW: 10Gbps– Buffer Capacity: 1Mbits (=2048 packets)
(1) Dual Port SRAM– t_cycle = 6.25nsec– Area: 16mm2 @ 0.18µm CMOS
(2) Parallel eDRAM buffer– t_cycle = 40nsec– 9 eDRAM MACROs– Cell efficiency is degraded.
• Area: 20mm2 inc. Bus and Latch
FIFO512b
80Gbps 10Gbps
512b
1Mb
eDRAM1
eDRAM9
LAT
LAT
512b
Sha
red
Bus
eDRAM2LAT
eDRAM3LAT80Gbps
10Gbps
ESSCIRC 2003
5
Hierarchical Output Buffer
SRAM DRAM Hierarchical Buffer= SRAM + eDRAM
80G
K
DRAM
10Glarge bandwidth,
small capacitysmall bandwidth,
Large capacity
80G
10GSRAM
FunnelSRAM
DRAM
10G
FIFO
K
intermediatebandwidth(30Gbps)
80G
10G
Max.80G
time
regurated b/wLarge
bandwidth
Largecapacity
time
input b/w
ESSCIRC 2003
6
80GbpsDual-Port
SRAM
DMUX
eDRAM
eDRAM
eDRAM
eDRAM
10Gbps
10Gbps
10Gbps
10Gbps
10Gbps
HOB FIFO
30GbpsMUX
address manager
512bits I/O
512bits I/O
regulatedirregular input
Hierarchical Output Buffer (Cont’d)
• Determination of K, SRAM and eDRAM capacity– Tradeoff b/w area cost and switch performance– Target Performance
• Packet loss probability: < 10-6 @ 90% offered load• Packet Latency < 100 cycles
K=30Gbps
ESSCIRC 2003
7
Hierarchical Output Buffer (Cont’d)
• Simulation Results (K=30Gbps)
Buffer Capacity Latency / Packet Loss Rate
SRAM: 64 packets (= 4KBytes)DRAM: 1024packets (= 0.5Mbits)
Latency : 100 cycles (= 4.8µsec)Packet Loss Rate: ~ 10-6
- Simulation Inputs: Trace of real Internet Protocol packets
Buf
fer S
ize
of e
DR
AM
[cel
ls]
Buf
fer S
ize
(SR
AM
) [ce
lls]
eDRAM
1
10
100
1000
10000
Offered Load(%)
SRAM
1cell = 64Bytes
20 30 40 50 60 70 80 90 100
32
64
16
48
80
Late
ncy
[cel
l-tim
e]Offered Load [%]
1
Cel
l Los
s Pr
ob.
Latency
Cell LossProb.
1cell-time = 48nsec
1E-8
1E-7
1E-6
1E-5
1E-4
1E-3
0.01
0.1
20 30 40 50 60 70 80 90 1001
10
100
1000
ESSCIRC 2003
8
RO
M 1
RO
M 2
RO
M 3
RO
M 4
Inpu
t BU
S (5
12b)
DRAMMACRO
Controller256b I/O
repe
ater
256b I/O
Run-time Traffic Control
HOB FIFO
Dua
l Por
t SR
AM
PLL
DRAMMACRO
DRAMMACRO
DRAMMACRO
Implementation of a Prototype
• Input Packet Generator– 20Gbps / ROM x 4 = 80Gbps Traffic Emulation– Run-Time Traffic Control
• PLL– Generates Multiple Clocks, 200MHz for SRAM, 25MHz for eDRAM
ESSCIRC 2003
9
Inpu
t Gen
erat
or
Inpu
t BU
S (5
12b)
DRAMMACRO
256b I/O
repe
ater
256b I/O
HOB FIFO
Dua
l Por
t SR
AM
PLL
DRAMMACRO
DRAMMACRO
DRAMMACRO
Implementation (Cont’d)
• Dual Port SRAM– 200MHz, 512b I/O, 64 words (4kB)– 1 Write Port, 1 Read Port– 4.5mm2
ESSCIRC 2003
10
DRAMMACRO
DRAMMACRO
Inpu
t Gen
erat
or
Inpu
t BU
S (5
12b)
256b I/O
repe
ater
256b I/O
HOB FIFO
Dua
l Por
t SR
AM
PLL
DRAMMACRO
DRAM(512x512)
Sense Amp
DR
AM
ctr.
Latch (128x2) Latch Latch Latch
Latch (128x2) Latch Latch LatchSense Amp
Sense Amp
Sense Amp
Sense Amp
Sense Amp
Sense Amp
Sense Amp
HOB Controller
Implementation (Cont’d)
• eDRAM MACRO– 25MHz, 512b I/O, 512 words x 4 MACROs (=1Mb)– Dual I/O Scheme for huge I/O bandwidth– 3.4mm2
ESSCIRC 2003
11
Implementation (Cont’d)
• Dual I/O Scheme1 MACRO (512x512)
Dual-I/OInterface
- Doubles the I/O Bandwidth- I/O width = Page Size Energy Efficient
cellarray
driver
256b
512b
512b
WDRV x 64 WDRV
WDRV x 64 WDRV
256b
.......
.....................
128b
.......
.....................
eDRAM MACRO
128bDual I/O circuits
clock pulse
odd / even data
Burstselect
eDRAM
Latches
MUX/DEMUX
CLKpulse
Write
Read
Sense Amp Sense Amp
Sense Amp Sense Amp
ESSCIRC 2003
12
Die Photo
RO
M
SRA
M
DRAM
S-D bus
S-D bus
PLL
• 0.16µm DRAM Process• Die Area : 6 x 11mm2
• HOB FIFO : 14.7mm2 7.6mm2 @ 0.18µm SRAM
0.16um DRAM process
0.35um DRAM Peripheral process
ESSCIRC 2003
13
Measurements
Chip On Board Measurement Setup
Waveforms
ESSCIRC 2003
14
Measurements
(2) ROM1 enable
(1) 25MHz DRAM Clock
(3) ROM SRAM : 200MHz
(4) eDRAM MACRO output1 1 1 00
25ns/divR
OM
SRA
M
DRAM
200MHz512b
25MHz512b
100MHz512b
PLL
(1)(2) (3)
(4)
4
100Gbps 12.5Gbps
ESSCIRC 2003
15
Conclusion
• Hierarchical Output Buffering (HOB) Technique is proposed for 10Gbps 8x8 Shared Bus Switch
– Area reduction• 7.6mm2 @ 0.18 µm embedded DRAM Process
( < 50% than conventional approach)
– Performance summary• Max. Bandwidth of 90Gbps with 1Mb capacity• Latency: 100 cycles, Packet Loss Rate: 10-6
• Dual I/O scheme expands the I/O width to 512bits