Microsoft PowerPoint - PhD-defense
-
Upload
networksguy -
Category
Documents
-
view
723 -
download
2
description
Transcript of Microsoft PowerPoint - PhD-defense
Analysis and Implementation ofOptoelectronic Network Routers
Ph.D. Defenseby
Mongkol Raksapatcharawong
SMART † Interconnects GroupElectrical Engineering - Systems Department
University of Southern California - LA
http://www.usc.edu/dept/ceng/pinkston/SMART.html
Date: September 25, 1998Time: 12:00pm, EEB-108
The Big PictureThe Big Picture
The Problem : Network bandwidth is becoming a bottleneck.Interconnection Networks must deliver sufficient bandwidth to keep pace with
microprocessor.
The Problem : Network bandwidth is becoming a bottleneck.Interconnection Networks must deliver sufficient bandwidth to keep pace with
microprocessor.
The Unknowns:
Performance Issues, Design Issues, and Technology Issues.
The Unknowns:
Performance Issues, Design Issues, and Technology Issues.
Potential Solution: O ptoelectronic Network RoutersOptoelectronic technology increases physical bandwidth.
Advanced router architectures improve bandwidth utilization.
Potential Solution: Optoelectronic Network RoutersOptoelectronic technology increases physical bandwidth.
Advanced router architectures improve bandwidth utilization.
OutlineOutline
■■ Background and MotivationBackground and Motivation
■■ Research Issues and Approach Research Issues and Approach
■■ Modeling Free-Space Optical k-ary n-cube Wormhole Networks Modeling Free-Space Optical k-ary n-cube Wormhole Networks
■■ Design Issues of Optoelectronic Network Routers Design Issues of Optoelectronic Network Routers
■■ Implementing Optoelectronic Network Routers Implementing Optoelectronic Network Routers
■■ Conclusions and Future Work Conclusions and Future Work
ProblemProblem☛ Starvation for off-chip bandwidth.
☛ On-chip clock rates are doubling compared to off-chip clock rates.
☛ Processor-memory bandwidth is doubling.
☛ Possible solution: integrate
processor and memory onto one
chip--IRAM (Patterson 1995).
☛ Problem: shifts bandwidth
problem to the network in
multiprocessor systems.1
10
100
1000
10000
1980 1986 1992 1998 2004 2010year
cloc
k ra
te (M
Hz)
In tel proc Intel bus SIA proc SIA bus
808880286
80386
80486Pentium
Pentium Pro Pentium IIMerced (IA-64)
Problem (Problem (cont’dcont’d))
0
10
20
30
40
50
60
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007Ye ar
Sus
tain
ed
Band
wid
th (G
B/s) MultithreadedMultithreaded
processorprocessor
Available off-chipAvailable off-chipbandwidthbandwidth
Single-threadedSingle-threadedprocessorprocessor
■■ PrefetchingPrefetching and and multithreadingmultithreading pipeline pipelinethe memory accesses and threadsthe memory accesses and threadsexecutions.executions.
■■ Both schemes generate Both schemes generate more off-chip more off-chiptraffic.traffic.
■■ Processor performance hasProcessor performance has increased increasedmuch faster than memory.much faster than memory.
■■ Memory latency hiding/tolerating Memory latency hiding/toleratingtechniques are techniques are required.required.
High-bandwidth network is required.High-bandwidth network is required.
Demand on Network BandwidthDemand on Network Bandwidth
☛ Multiprocessor systems require high-performance network.
☛ Network Router must be fast and provide high bandwidth.
Optoelectronic router can help mitigate the bandwidth problem.Optoelectronic router can help mitigate the bandwidth problem.
State-of-the-Art Network RoutersState-of-the-Art Network Routers
Router Year On-chip/Off-chipClock Rates (MHz)
Internal/ExternalChannel width (bits)
SGI Spider [Galles, 1996] 100/200 (double-edge) 80/20Intel Teraflop [Carbonaro/Verhoorn, 1996] 200/200 16/16
Cray T3E [Scott/Thorson, 1996] 75/375 70/14Reliable Router [Dally et al., 1994] 100/100 (double-edge) 32/23
■■ All routers All routers employ sophisticated architectural techniquesemploy sophisticated architectural techniques, e.g., adaptive routing,, e.g., adaptive routing,pipelined functions, etc.pipelined functions, etc.
■■ Network routers are mostly Network routers are mostly designed according to the available off-chip bandwidthdesigned according to the available off-chip bandwidth,,barely take advantage of the state-of-the-art semiconductor technology.barely take advantage of the state-of-the-art semiconductor technology.
Limited off-chip bandwidth limitsthe performance of network routers.
Limited off-chip bandwidth limitsLimited off-chip bandwidth limitsthe performance of network routers.the performance of network routers.
Interconnect Year Transm ission rate (G Hz) Channel width (bit)Equalized serial line [Dally, 1996] 4 1
Bidirectional signaling [Haycock/M ooney, 1997] 2.5 8POLO [USC/HP, 1996] 1 10
Optobus II [Motorola, 1995] 0.8 10ChEEtah [USC/Honeyw ell, 1997] > 1.0 12
State-of-the-Art InterconnectsState-of-the-Art Interconnects
■■ High-performance electrical interconnectsHigh-performance electrical interconnects suffer more from signal skew and jitter, suffer more from signal skew and jitter,and and usually operate in serial mode.usually operate in serial mode.
■■ Optical interconnects suffer less Optical interconnects suffer less of the same effects and operate in wider channels.of the same effects and operate in wider channels.
■■ In exchange to higher performance, electrical interconnects In exchange to higher performance, electrical interconnects require large andrequire large andsophisticated transceiver circuits.sophisticated transceiver circuits.
Optical interconnects show a potentially better price/performance.Optical interconnects show a potentially better price/performance.Optical interconnects show a potentially better price/performance.
Equalized Line Transmitter [Dally 96]Size: 550µm x 900µm (5525%)Speed: 4Ghz
I/O pad driver [Tanner Research 97]Size: 80µm x 112µm (100%)Speed: ~200MHz
Optoelectronic Transmitter [Lucent 97]Size: 17µm x 11µm (2.1%)Speed: 2.48GHz
Optoelectronic Receiver [Lucent 97]Size: 17µm x 13µm (2.5%)Speed: 2.48GHz
Sizes are based on 0.5 µm (CMOS-HP 14B) technology.
Transceiver Sizes ComparisonTransceiver Sizes Comparison
Previous Work in Complex Optoelectronic ChipsPrevious Work in Complex Optoelectronic Chips
■■ the AMOEBA switch chipthe AMOEBA switch chip by by Krishnamoorthy Krishnamoorthy et al., 1996et al., 1996
■■ a 64-bit microprocessor corea 64-bit microprocessor core by by Kiamilev Kiamilev et al., 1996et al., 1996
■■ the Optical Multiprocessor Network Interface (OMNI) chipthe Optical Multiprocessor Network Interface (OMNI) chip by by Pinkston Pinkston and and Seelan Seelan,,19961996
■■ a 1kbit a 1kbit photonic photonic page bufferpage buffer by by Krishnamoorthy Krishnamoorthy et al., 1996et al., 1996
■■ a 16kbit a 16kbit photonic photonic page bufferpage buffer by by Kiamilev Kiamilev et al., 1997et al., 1997
■■ a multiply-accumulate DSP corea multiply-accumulate DSP core by by Rozier Rozier et al., 1998 et al., 1998
Previous work focused on design and implementation, notperformance evaluation of complex optoelectronic chips in general.
Previous work focused on design and implementation, notPrevious work focused on design and implementation, notperformance evaluation of complex optoelectronic chips in general.performance evaluation of complex optoelectronic chips in general.
CMOS and SEED Technology TrendsCMOS and SEED Technology Trends[SIA 97 and Krishnamoorthy 96][SIA 97 and Krishnamoorthy 96]
Year of first shipm ent 1999 2001 2003 2006 2009Technology ( µm) 0.18 0.15 0.13 0.10 0.07# Transistors (m illions) 6.2 10 18 39 84On-chip/Off-chip Clocks (MHz) 1250/480 1500/785 2100/885 3500/1035 6000/1285# Pin-outs Required (pins) 1570 2000 2400 3270 4400# BGA Package Pin-outs (pins) 1500 1800 2200 3000 4100# SEEDs (per chip) 8000 12000 20000 35000 47000Bonding Pad size (µm) 9 8 7 5 4
Optoelectronic SEED technology shows the potential tosustain the increasing bandwidth requirement.
Optoelectronic SEED technology shows the potential toOptoelectronic SEED technology shows the potential tosustain the increasing bandwidth requirement.sustain the increasing bandwidth requirement.
WARRP Router: Complexity and I/O Pin-out RequirementWARRP Router: Complexity and I/O Pin-out Requirement
☛ Electronic I/O (BGA packaging) is a limiting factor.
Network routers can benefit from large # of I/O pin-outs provided by CMOS/SEED.Network routers can benefit from large # of I/O pin-outs provided by CMOS/SEED.
0.01
0.1
1
10
100
10 100 1000 10000
1-VC 2-VC 3-VC
# pin-outs
# tra
nsist
ors
(milli
ons)
ServerNet II
Electronic-based pin-outs CMOS/SEED-basedpin-outsCommercial routers
Intel Teraflop
Cray T3E
SGI Spider
1995
2003
2009
Year (BGA packaging)
1D-16B-Uni
8D-256B-Bi
8D-64B-Bi
8D-16B-Bi2D-8B-Bi
1D-8B-Uni
1D-4B-Uni
WARRP II
Mosaic C (1992)
Mosaic (1987)
Proposed SolutionProposed Solution
Optoelectronic Network Router based on the WARRP (Wormhole
Adaptive Recovery-based Routing via Preemption) Architecture:
☛ dense optoelectronic I/O devices—provide design flexibility
☛ high-speed signaling—enable the design of high-performance network routers
☛ increased bandwidth—allow advanced network router architectures
The proposed solution is potentially advantageous inthe development of next-generation network routers.The proposed solution is potentially advantageous inThe proposed solution is potentially advantageous inthe development of next-generation network routers.the development of next-generation network routers.
Research Issues and ApproachResearch Issues and Approach
Optoelectronic network routers:
☛ How does it benefit the multiprocessor network?—use analytical model based
on widely-employed k-ary n-cube class of networks.
☛ What are the issues pertinent to the development of such routers?—use CAD
tools and semi-empirical model based on the WARRP router to identify the
problem and evaluate the chips’ performance.
☛ Can they be implemented?—implement the WARRP router through various
optoelectronic integrated technologies.
Implementation Cost Model—Connection CapacityImplementation Cost Model—Connection Capacity
■ Bisection Width [Dally 90] is the number of connectionscrossing an imaginary plane dividing system into two equalhalves—useful for electrical interconnected systems.
■ Connection Capacity [Mongkol & Pinkston 96] isintroduced as the number of connections that can beestablished for a given imaging system—useful for 3-D free-space optical interconnects.
Bisection Width and Connection CapacityBisection Width and Connection Capacityof k-ary n-cube Networksof k-ary n-cube Networks
Bisection width:Bisection width: B = 2WnkB = 2Wnkn-1n-1
Connection capacity:Connection capacity: C = C = WnkWnknn
Where n is the network dimension,k is the network radix,W is the channel width.
Where n is the network dimension,k is the network radix,W is the channel width.
System A16-node torus
System B8-node hypercube
Bisection width 8 8Connection
capacity32 24
Mirror plane
Optical signal path(Only one row is shown)
Bisection plane
Microlens-hologram plane
Diffractive-Reflective Optical Interconnect (DROI)Diffractive-Reflective Optical Interconnect (DROI)Diffractive-Reflective Optical Interconnect (DROI)
➨➨ A system with A system with connection capacity of 24 can implement onlyconnection capacity of 24 can implement onlySystem BSystem B though both systems have similar bisection width. though both systems have similar bisection width.
Connection capacity is a Connection capacity is a more accuratemore accurate implementation cost measure. implementation cost measure.
Bisection Width and Connection Capacity ComparisonBisection Width and Connection Capacity Comparison
Network Latency for Wormhole Switched NetworksNetwork Latency for Wormhole Switched Networks
TTnetnet = = DD((ttrr + + ttss + + ttww) + max() + max(ttss,, ttww) ) L/L/WW
■■ Effects of optoelectronic Effects of optoelectronic technology on network latency: technology on network latency:
■■ Dense I/O pin-outs affects network topology ( Dense I/O pin-outs affects network topology (DD) and channel width () and channel width (WW); and); and
■■ High-speed signaling reduces propagation delay ( High-speed signaling reduces propagation delay (ttww).).
Where Tnet is the low load network latency,D is the network hops from source to destination,L is the data message length,W is the channel width,tr is the routing time,ts is the data-thru time,tw is the wire delay time.
Where Tnet is the low load network latency,D is the network hops from source to destination,L is the data message length,W is the channel width,tr is the routing time,ts is the data-thru time,tw is the wire delay time.
Other Important Equations for Performance EvaluationOther Important Equations for Performance Evaluation
Interconnection distance:Interconnection distance:
⋅⋅
=⋅
=⋅
=
−
−
−
,other any sin
2
,4sin
2
,2sin
2
12
1
12
max
kkp
kp
kp
R
n
n
n
θ
θ
θ
Connection capacity:Connection capacity: 22 D
system
M
AC =
Channel width:Channel width: ( ) kNNM
AnkW
Doptics log
log2,
2⋅
⋅
=
( )2
,k
NT
ALnkW
welec ⋅
=
( ),,,,, knphFAsystem θ= ( )fbx wLnF ,,,λθ =
■■ Assuming Tc is determined byAssuming Tc is determined bypropagation delay.propagation delay.
■■ Tc = Tc = To/eTo/e + +TeTe/o/o + +TpropTprop
conversion delayconversion delay (technology dependent)(technology dependent)
}
Channel Cycle Time (TChannel Cycle Time (TCC))
■■ ConversionConversion time is time is notnot a a killer!!killer!!
■■ It is very It is very importantimportant to have an to have an efficientefficientimaging systemimaging system..
propagation delay (topology dependent)
0
15
30
45
60
75
0.1 0.3 0.5 0.7 0.9link e f f icie ncy, η
(r e cie ve d pow e r , m W )
Rm
ax, c
m0.85
1.85
2.85
3.85
4.85
Tc, n
s
to po lo g y-de pe nde n t r e g ion
te chno lo g y-de pe nde n t r e g ion
Tc-in t = 4ns
00.5
11.5
22.5
33.5
0.1 0.3 0.5 0.7 0.9l in k effic ien cy, h
(recieved p o wer , m W )
in terco n n ectio n d istan ce, m
de
lay,
n
To /e Te /o Tprop
cro ssp o in t
Performance EvaluationPerformance Evaluation
Parameters for ELECTRICAL system.
Chip area 1in2
PCB size 12x12in2
# of layers 20min. connection length (p) 1.5in
Parameters for OPTICAL system.
laser wavelength (λ) 850nmVCSEL beam radius 5µmVCSEL output power 1mWP-I-N detector size 15x15µm2
microlens diameter 125µmlink efficiency (η) ~ 63%
chip area 1cm2
interconnection area 12x12cm2
usable microlens area (A) 64cm2
min. connection path (p) 1.5cmmax. deflection angle
(θmax)~ 24 o
OpticsOptics vs vs Electronics Electronics (64-node system) (64-node system)
Channel Width and Network LatencyChannel Width and Network Latency
0
20
40
60
80
100
120
140
160
180
2 3 4 5 6
dimension, n
chan
nel w
idth
, bits
/cha
nne
l
e lec optics
0
50
100
150
200
250
300
350
400
2 3 4 5 6
dimension, nne
twor
k la
tenc
y, n
s
e lecopticsoptics(200Mhz)
■■ OpticsOptics could provide about could provide about an order of magnitude higher connectivityan order of magnitude higher connectivity than thanelectronic.electronic.
■■ Optics still yields about twice the channel width of electronic.Optics still yields about twice the channel width of electronic. Hence, network Hence, networklatency is lower!latency is lower!
■■ Even ifEven if channel cycle time is determined by internal router delay, channel cycle time is determined by internal router delay, wider channel stillwider channel stillgreatly benefits the network latencygreatly benefits the network latency (shown as optics (200MHz)). (shown as optics (200MHz)).
Packaging Issues: Power DissipationPackaging Issues: Power Dissipation
0
30
60
90
2 3 4 5 6
dimension, n
late
ncy,
ns
e lec optics
laser(low )
0
100
200
300
400
500
600
2 3 4 5 6
dimension, n
chan
nel w
idth
, bit/
chan
nel
e lec optics
laser(low)
■■ Limited cooling capability reducesLimited cooling capability reduces the achievable I/O pin-outs in optics. the achievable I/O pin-outs in optics.
■■ Optics still yields lower network latencyOptics still yields lower network latency due to faster achievable cycle time. due to faster achievable cycle time.
Packaging and Device TolerancesPackaging and Device Tolerances
Angular misalignment
Longitudinal misalignment
TX RX Lateral misalignment
■■ Lateral misalignment:Lateral misalignment: ∆Lat = 102µm
■■ Longitudinal misalignment:Longitudinal misalignment: ∆Long = 230µm
■■ Angular misalignment: Angular misalignment: ∆θ = 0.044o
■■ Wavelength variation: Wavelength variation: ∆λ = 0.8nm
Optoelectronic Network Routers: How beneficial?Optoelectronic Network Routers: How beneficial?
Multiprocessor networks Multiprocessor networks can benefit from optoelectronic routerscan benefit from optoelectronic routers in two ways: in two ways:
■■ A large number of I/Os A large number of I/Os allows more design flexibilityallows more design flexibility, i.e., a wide-range of, i.e., a wide-range oftopologies is efficiently supported.topologies is efficiently supported.
■■ High-speed optical signaling unleashes the power of high-performanceHigh-speed optical signaling unleashes the power of high-performancenetwork routers by network routers by fully utilizing the advanced semiconductor technology.fully utilizing the advanced semiconductor technology.
Given that:Given that:
■■ Better packaging technologyBetter packaging technology (includes cooling technique, micro-optic (includes cooling technique, micro-opticalignment technique, etc.) and more uniform characteristic optoelectronicalignment technique, etc.) and more uniform characteristic optoelectronicdevices devices are available.are available.
■■ The bottom line: optoelectronic and its related technologies are progressingThe bottom line: optoelectronic and its related technologies are progressingat an impressive rate and, hence, at an impressive rate and, hence, the above conclusions are becoming athe above conclusions are becoming anear-term reality.near-term reality.
Pixel-based Pixel-based vsvs. Core-based CMOS/SEED Designs. Core-based CMOS/SEED Designs
Pixel-based designs:☛ small (self-contained) circuitry☛ implements simpler functions☛ connections are local and regular
Core-based designs:☛ large (non-self-contained) circuitry☛ implements complex functions☛ connections are global and less regular
The TRANSPAR chip (courtesy A. Sawchuk, USC) The WARRP II chip (SMART group, USC)
pixel
core
Design issues exist in implementing core-based designs!Design issues exist in implementing core-based designs!
Core-based CMOS/SEED Design IssuesCore-based CMOS/SEED Design Issues
☛ Large number of SEED transceivers must be integrated with CMOS core.
☛ CMOS I/O ports are not perfectly aligned with the SEED array.
☛ At least the top metal layer is reserved exclusively for SEED wiring to
simplify CMOS/SEED integration.
☛ Space-invariant imaging system requires structured I/Os on the chip.
Wiring in core-based designs is a problemWiring in core-based designs is a problem
Consequences:
☛ Connections between transceivers to CMOS I/O ports and/or bonding pads are longer.
☛ Less wiring and area resources for CMOS circuitry, reducing transistor density.
☛ May increase critical paths, reducing achievable on-chip clock rates.
Solutions for the Wiring ProblemSolutions for the Wiring Problem
☛ Manual integration (simpler, more primitive method)
☛ CMOS core and SEED array are separately designed and do not fully overlap
(e.g., WARRP II and 64-bit processor core [Kiamilev et al., MPPOI 96]).
(+) Compatible with CMOS CAD tools.
(−) Chip resources are hardly optimized, seriously negate the chip performance.
(−) Impractical for large core-based designs.
Core-based Designs using Manual IntegrationCore-based Designs using Manual Integration
The 1kbit Photonic Page-buffer chip [Krishnamoorthy et al., AO 96]
SRAM cells and datapath circuitswith the SEED array on top
SRAM cells and datapath circuitswith the SEED array on top
SEED transceivers arelocated on the peripherySEED transceivers are
located on the periphery
(+) Simplifies the wiring problem(+) Compatible with existing CAD tools
(−) Very long connections(−) May increase critical paths(−) Low chip area utilization
SEED andreceiver array
SEED andtransmitter array
data
path
SRAM
cel
ls
SRAM
cells
datapath
The 16kbit Photonic Page-buffer chip [Kiamilev et al., IJO 97]
Core-based Designs using Manual Integration (Core-based Designs using Manual Integration (cont’dcont’d))
CMOS circuits are placed on the periphery of the SEEDarray and corresponding transceivers
CMOS circuits are placed on the periphery of the SEEDarray and corresponding transceivers
(+) Simplifies the wiring problem(+) Compatible with existing CAD tools(+) Reduces connection length(+) Improves signal integrity
(−) Low chip area utilization
☛ Automatic integration (under development)
☛ CMOS core and SEED array are simultaneously optimized by CAD tool.
(+) Higher chip performance can be achieved.
(+) Practical for large core-based designs.
(−) Requires optoelectronic-compatible CAD tools.
(−) Effects of long connections and less transistor density still exist.
Automatic integration is the more efficient and preferred method.Automatic integration is the more efficient and preferred method.
Solutions for the Wiring Problem (Solutions for the Wiring Problem (cont’dcont’d))
Core-based Design using Automated CAD ToolsCore-based Design using Automated CAD Tools
The Multiply-accumulate chip [Rozier et al., LEOS 98],designed using EPOCH and EGGO CAD tools
CMOS circuits, SEED array, and SEEDtransceivers are fully-overlapped
CMOS circuits, SEED array, and SEEDtransceivers are fully-overlapped
(+) Directly tackles the wiring problem(+) Improves chip resource utilization(+) Mitigates the longer connections andless transistor density effects
(−) Requires optoelectronic-compatibleCAD tools
Cost Estimation of Core-based CMOS/SEED DesignsCost Estimation of Core-based CMOS/SEED Designs
Wiring utilization is determinedby synthesis of the WARRProuter using EPOCH tool.
Wiring cost modelWiring capacity model
SEED parameters :bonding pad size,number of SEEDs,and SEED pitches
SEED parameters :bonding pad size,number of SEEDs,and SEED pitches
Wiring parameters :number of available metal layers,signal types (single or dual-rail),
routing style, wiring utilization, andmetal pitches
Wiring parameters :number of available metal layers,signal types (single or dual-rail),
routing style, wiring utilization , andmetal pitches
System-level parameter :interconnection pattern (optical
imaging system constraints)
System-level parameter :interconnection pattern (optical
imaging system constraints)
Number of metal layers required by SEED wiringNumber of metal layers required by SEED wiring
☛ Estimate transistor density☛ Estimate critical path length☛ Estimate aggregate off-chip bandwidth
SEED and Wiring ParametersSEED and Wiring Parameters
Where:D is the total number of SEED diodes,DX is the number of SEEDs in x-direction,Dy is the number of SEEDs in y-direction,P is the bonding pad size,Xpitch is the pitch of diode in x-direction,Ypitch is the pitch of diode in y-direction,MX-pitch is the pitch of metal layer in x-direction,MY-pitch is the pitch of metal layer in y-direction.
BondingPad
SEED
BondingPad
SEED
BondingPad
SEED
BondingPad
SEED
Xpitch
Ypitch
P MY-pitch
MX-pitch
We need to find the wiring capacity providedby the space in the SEED array and the
wiring cost required to connect all SEEDs.
We need to find the wiring capacity providedby the space in the SEED array and the
wiring cost required to connect all SEEDs.
Wiring Capacity and Wiring Cost ModelsWiring Capacity and Wiring Cost Models
Assumptions:
☛ Signals are dual-rail.
☛ Wiring is X-Y style and requires at least 2 metal layers.
☛ SEEDs and CMOS I/O ports are placed randomly (worst case).
Wiring Capacity in x- and y-directions: Wiring Cost in x- and y-directions:
⋅⋅=
−
Dm
YKX
pitchX
pitchiC
⋅
−⋅=
−
Dm
PXKY
pitchY
pitchjC
22x
R
DDX ⋅=
22y
R
DDY ⋅=
Ki and Kj are the wiring utilization of metal layer i and j, typical values are 65% to 75%
Performance comparison between CMOS/SEED & CMOS chipsPerformance comparison between CMOS/SEED & CMOS chips
50
60
70
80
0.18 0.15 0.13 0.1 0.07Technology (um)
TX D
ensi
ty (%
) ava
ilabl
e to
CM
OS
/SEE
D c
hips
1
2
3
4
5
# Metal Layers
CMOS/SEED Transistor density
# Metal layers required for SEED Routing
Year of first shipm ent 1999 2001 2003 2006 2009Technology ( µm) 0.18 0.15 0.13 0.10 0.07# of Metal Layers Required (x, y) 1,1 1,1 2,1 2,2 2,2Normalized Transistor D ensity 0.778 0.778 0.645 0.592 0.675Normalized On-chip Clock 0.768 0.768 0.737 0.706 0.706Normalized Aggregate Bandwidth 2.131 2.740 4.392 7.210 9.716
100
1000
10000
100000
0.18 0.15 0.13 0.1 0.07Technology (um)
Ban
dwid
th (
GB
/s)
1000
10000
100000
# I/Os
Max BW (SEED) Max BW (SEED)
#I/Os (SEED) #I/Os (BGA)
Design Cost EstimationDesign Cost Estimation
Given the design information is available:Given the design information is available:
■■ Chip area can be estimated.Chip area can be estimated.
■■ If the cost of design is fixed, what configurations can be implemented?If the cost of design is fixed, what configurations can be implemented?
■■ To conclude, To conclude, the model gives relevant information that we have not knownthe model gives relevant information that we have not knownbefore regarding optoelectronic implementations of complex chip designs.before regarding optoelectronic implementations of complex chip designs.
The results can be used to validate that, even with thewiring problem, complex optoelectronic network routers
can still be effectively implemented !!
The results can be used to validate that, even with theThe results can be used to validate that, even with thewiring problem, complex optoelectronic network routerswiring problem, complex optoelectronic network routers
can still be effectively implemented !!can still be effectively implemented !!
Core-based CMOS/SEED Chips: Are They Effective?Core-based CMOS/SEED Chips: Are They Effective?
Compared to pure-CMOS chips, CMOS/SEED chips:Compared to pure-CMOS chips, CMOS/SEED chips:
■■ sacrifice at most 40% of transistor density and 30% of on-chip clock rates insacrifice at most 40% of transistor density and 30% of on-chip clock rates inexchange of an order of magnitude more I/O pin-outs.exchange of an order of magnitude more I/O pin-outs.
Given that:Given that:
■■ optoelectronic compatible CAD tools are available.optoelectronic compatible CAD tools are available.
■■ The bottom line: as transistors are cheaper in time, complex CMOS/SEEDThe bottom line: as transistors are cheaper in time, complex CMOS/SEEDchips provide the valuable bandwidth critically needed by current and nextchips provide the valuable bandwidth critically needed by current and nextgeneration computer systems, at a very compromising cost.generation computer systems, at a very compromising cost.
Fully adaptive wormhole network routerFully adaptive wormhole network router**
ProcessingNode
FC
IB
FC
IB
FC
IB
FC
IBDM
DM
DM
DM
5 x 6Crossbar
Switch
Normal Router
Deadlock Router
MX
OEI
OEI
OEI
OEI
FC
OB
FC
OB
FC
OB
FC
OB
Proc In Proc Out
X+
X−
Y+
Y−
X+
X−
Y+
Y−
InputPhysicalChannels(optical )
MXEOI
MXEOI
MXEOI
MXEOI
MX
OutputPhysicalChannels(optical)
Deadlock routing section
Normal routing section
deadlock
DB
Legend: DM: Demultiplexer MX: Multiplexer FC: Flow Controller
IB: Input VC Buffers OB: Output VC Buffers DB: Deadlock buffer
OEI: Opto-Electronic Interface EOI: Electro-Optic Interface
Internal flow control External flow controlExternal flow control
*Shown is a 2-D torus-connected, fully-adaptive, deadlock-recovery network router with 1 virtual channel.
The WARRP Core—A Monolithic GaAs Network Router CoreThe WARRP Core—A Monolithic GaAs Network Router Core
■■ NCIPT(ARPA) / MIT NCIPT(ARPA) / MIT Optochip Optochip ProjectProject
■■ Implements core circuit of deadlock handlingImplements core circuit of deadlock handlingmechanismsmechanisms (deadlock buffer, input/output (deadlock buffer, input/outputbuffers, arbitration logic, flow control logic).buffers, arbitration logic, flow control logic).
■■ Uses Uses monolithicmonolithic GaAs GaAs based technology based technology to toimplement both logic functions and opticalimplement both logic functions and opticalI/O (LED and OPFET detector).I/O (LED and OPFET detector).
Technology: 0.6Technology: 0.6µµm m VitesseVitesse H-GaAs III process H-GaAs III process (ECL compatible logic) (ECL compatible logic)
Die size: 2mm x 1mmDie size: 2mm x 1mmComplexity: ~1,400 transistorsComplexity: ~1,400 transistors# electrical I/Os: 27 signals# electrical I/Os: 27 signals# optical I/Os: 12 single-ended signals# optical I/Os: 12 single-ended signals
■■ 1-bit wide, 4-flit deep buffers; ring1-bit wide, 4-flit deep buffers; ringtopology. This is topology. This is sufficient to demonstratesufficient to demonstrateprogressive deadlock recovery.progressive deadlock recovery.
■■ 5 state complex FSM controller with5 state complex FSM controller withpreemption prediction logic.preemption prediction logic.
■■ Operates at > 50 MHz (under SPICE).Operates at > 50 MHz (under SPICE).
■■ Status: electrical and optical versions areStatus: electrical and optical versions areavailable.available.
WWormhole ormhole AAdaptive daptive RRecovery-based ecovery-based RRouting viaouting viaPPreemption (reemption (WARRPWARRP) core) core
LEDLEDOPFET detectorOPFET detector
The WARRP II Router ChipThe WARRP II Router Chip
☛ Ring topology☛ 4-bit-wide unidirectional channels☛ 1 virtual channel with 2-flit deep buffers☛ Fully adaptive, deadlock recovery routing☛ The core circuitry requires ~10,000 transistors.☛ Die size (core) is 0.836x0.822mm2 (0.687mm2)☛ 40 Electrical I/Os and 20 dual-rail SEED I/Os☛ CMOS HP14B process with 3 metal layers☛ Operates at ~30MHz (using IRSIM)
WARRP II core circuitryWARRP II core circuitry
SEED modulator and driver circuitsSEED modulator and driver circuits
SEED receiver and driver circuitsSEED receiver and driver circuits
ContributionsContributions
■■ Explain the network bandwidth problemExplain the network bandwidth problem which is becoming more and more which is becoming more and morecritical in multiprocessor systems.critical in multiprocessor systems.
■■ Introduce the connection capacity concept and establish a cost andIntroduce the connection capacity concept and establish a cost andperformance modelperformance model based on it to analyze the performance of 3-D optical based on it to analyze the performance of 3-D opticalnetworks.networks.
■■ Identify the wiring problem in complex CMOS/SEED chip designsIdentify the wiring problem in complex CMOS/SEED chip designs and model and modelthe performance of such chips incorporating the wiring problem by using athe performance of such chips incorporating the wiring problem by using asemi-empirical model.semi-empirical model.
■■ Implement optoelectronic network router chipsImplement optoelectronic network router chips based on the WARRP router based on the WARRP routerarchitecture using monolithic and hybrid optoelectronic/VLSI technologies.architecture using monolithic and hybrid optoelectronic/VLSI technologies.
■■ Suggest some advanced architectural techniquesSuggest some advanced architectural techniques to improve the network to improve the networkperformance and network bandwidth utilization.performance and network bandwidth utilization.
ConclusionsConclusions
■■ Optoelectronic network routers Optoelectronic network routers not only increase the network bandwidth but alsonot only increase the network bandwidth but alsofacilitate the developmentfacilitate the development of high-performance network routers. of high-performance network routers.
■■ Optoelectronic network routers Optoelectronic network routers are feasibleare feasible given that packaging, device, and given that packaging, device, andoptoelectronic compatible CAD tool technologies are effectively addressed.optoelectronic compatible CAD tool technologies are effectively addressed.
■■ An optoelectronic network router An optoelectronic network router shows the potential to outperformsshows the potential to outperforms its electronic its electroniccounterpart in terms of available bandwidth and number of I/O pin-outs.counterpart in terms of available bandwidth and number of I/O pin-outs.
■■ Internal network router architecturesInternal network router architectures must also be must also be reevaluatedreevaluated to maximize theto maximize theon-chip bandwidth and on-chip bandwidth and not to be a bottlenecknot to be a bottleneck under high-bandwidth interconnects under high-bandwidth interconnectsenvironment.environment.
Future WorkFuture Work
Network performance and bandwidth utilization can be improvedNetwork performance and bandwidth utilization can be improved by byincorporating advanced architectural techniques such as:incorporating advanced architectural techniques such as:
■■ Efficient channel configurations.Efficient channel configurations.
■■ Asynchronous token-based channel arbitration.Asynchronous token-based channel arbitration.
■■ Flit-bundling transfer technique.Flit-bundling transfer technique.
■■ Delayed-buffer technique.Delayed-buffer technique.