100Gb/sElectricalLinksSystemViewMarkGustlin – XilinxBethKochuparambil – CiscoKentLusted– IntelGaryNicholl- CiscoDavidOfelt– JuniperRobStone- Broadcom
1
GroundRulesThispresentationisaboutmotivatingusecasesandtheissuesaroundeach◦ Expectotherpresentationsinresponse- providing:
◦ objectives◦ technicalfeasibility(TF)◦ economicfeasibility(EF)◦ broadmarketpotential(BMP)
Sometraditionalobjectives(ex:chiptomodule)mayhavedifferentissuesateachendofthelink
2
SystemOverview– PizzaBox
Module*
ASIC
Retimer
ASICEscape
ModuleConnection
NewCopperCablePMDs
PizzaBox
Module*ExistingPMDs
or
Retimer
*:Modulecanbetraditionalfront-panelmoduleorOBO
or
ModuleConnector
4
SystemOverview– LineCard
Module*
ASIC
BackplaneConnector
ModuleConnector
Retimer
BackplaneLink ASICEscape
ModuleConnectionLineCard
Module*ExistingPMDs
or
Retimer
*:Modulecanbetraditionalfront-panelmoduleorOBO
or
or
Retimer
5
NewCopperCablePMDs
GeneralSystemObservationsForline-cardbasedsystems- afasterlinecardislessuseful:◦ Ifthereisn’tbackplanebandwidthtosupportit◦ Ifthereisn’tenoughfaceplatedensity(orboardsurfacearea)togetthebandwidthoutofthebox
Forstandalonesystems– afasterASICcanstillbeuseful◦ Noincreaseinfaceplatedensity,butwillincreasetotalbandwidthandcrossbarradix
OIF100Gb/sSERDESwork:◦ Arethecurrentchannelsandobjectivesappropriatefortheworkhere?
Boardsarebig– gettingfromthecentertomodulesinthefrontcornerscanbetricky◦ OBOdon’tnecessarilyhelpthatmuch,sincetheycan’tallbepackedrightnexttotheASIC
6
CopperCableWhatareachievablereachesfor:◦ Passivecoppercable?◦ Activecoppercable?
Whataresystemimplicationstoachievethesereaches?
Whataretheend-userimplicationsofthesereaches◦ Whatcommondatacenterarchitectureswork/don’t’work?
Whatisthecostdeltaforactive–vs- passivecable◦ ImpactsbroadmarketpotentialforCucableifthecostapproachesoptics
8
CopperCablesinDataCenterRacks
“Inter-rackSwitch”5mDACreach– 802.3by
(25GBASE-CR)
“Intra-rackSwitch”3mDACreach– 802.3cd3mDACreach– 802.3by
(25GBASE-CR-S)
=switch =server=Cucable “EndofRow Switch”
NoDACreach
“MiddleofRackSwitch”1-2m?DACreach
=Fibercable
?
Whatlengthistooshort?
AreFECorhostlosschangesneeded?
Whatistheimpacttoindustry?
10
ModuleConnection(AUI)Backwardscompatibilityissues(seefollowingslides)
Whatarethesystemimplicationstomake100Gb/sAUIswork?◦ retimer proximity/universality,PCBmaterial,fly-overcablesrequired,etc,etc?◦ DotheserequirementsaffectEForBMP?
WhataretheeconomictradeoffsbetweenredoingPMDbudgetsandspendingsignificantlymoreonthesystem?
HowdifferentisthemoduleTX/RXSERDESfeaturesetthantheASICTX/RXSERDESfeatureset?
SystemswillneedtosupportmodulesthatrunmanyratesofEthernet◦ Hardtohavelossbudgetsthataredifferentperratewhenthesamemodulecanoperateatdifferentratesusingbreakout.
12
BackwardsCompatibilityIssuesWillwebeabletoorwanttosupportexisting(orsoontobeexisting)PMDswithanewAUI?◦ Assumptioniswewillwanttosupport8023.bs/cdPMDs,especiallytheDRversionssincethoseare100Gperlane
Issuesinclude:◦ FECpartitioningbudgets◦ FECchoice◦ Latency◦ Backwardscompatibility
SolutionSpacetoinvestigate:1. Make100Gb/schip-to-modulelinksworkwith802.3bsand802.3cdelectricalFECbudgets2. Changeend-to-endlinkbudgetof802.3bsand802.3cdPHYstoallocatemoreerrorstotheelectricallinks3. TerminateFECinmoduleandregenerateFECforthewire(segment-by-segmentFEC)4. Adda(hopefullylightweight)wrapperFECtoprotect100Gb/selectricallinks
13
200/400GbE“Legacy”ModuleBasedPHYs
200/400GEPHYs Technology/Reach FEC FECCoverage
200GBASE-DR4 4lanesSMF,500mreach RS(544,514) EndtoEnd
200GBASE-FR4 4WDMSMF,2kmreach RS(544,514) EndtoEnd
200GBASE-LR4 4WDMSMF,10kmreach RS(544,514) EndtoEnd
400GBASE-SR16 16lanes MMF,100mreach RS(544,514) EndtoEnd
400GBASE-DR4 4lanesSMF,500mreach RS(544,514) EndtoEnd
400GBASE-FR8 8WDMSMF,2kmreach RS(544,514) EndtoEnd
400GBASE-LR8 8WDMSMF,10kmreach RS(544,514) EndtoEnd
ThemajorityofthesePHYsarebasedon50Gb/sperlanetechnology◦ Supportingthemwith100Gelectricalinterfacesrequiresareversemuxinthemodule
◦ Bitmuxingissupportedforchanginglanewidths
Only400GBASE-DR4uses100Gb/slanetechnology
FECisend-to-end,RS(544,514)
PMDportionoftheBERend-to-endbudgetisalways2.4x10-4
14
100GbE“Legacy”ModuleBasedPHYs200/400GEPHYs Technology/Reach FEC FECCoverage
100GBASE-SR10 10lanes MMF,100mreach NoFEC N/A
100GBASE-SR4 4lanesMMF,100mreach RS(528,514) PMDonly
100GBASE-LR4 4WDMSMF,10kmreach NoFEC N/A
100GBASE-ER4 4WDMSMF,40kmreach NoFEC N/A
100GBASE-SR2 2lanesMMF,100mreach RS(544,514) EndtoEnd
100GBASE-DR 1laneSMF,500mreach RS(544,514) EndtoEnd
ThesePHYsarebasedon10,25,50and100Gb/sperlanetechnology◦ Supportingthem(withtheexceptionofDR)with100Gelectricalinterfacesrequiresareversemuxinthemodule
Only100GBASE-DRuses100Gb/slanetechnologyFECisend-to-endforsome(SR10,LR4andER4don’trequireFECatall)TherearemanyderivativePMDsintheindustryspecifiedbyMSAsetc.◦ TheyreusetheIEEEPCS/FEC
PMDportionoftheBERendtoendbudgetis2.4x10-4fortheDRPHY
15
Opt1:LegacyBERPartitioningfor802.3cd/bs PHYsForacompletePHYtheerrorpartitioningbelowgivesusaBERof1x10-13 (orequivalentframelossratio)
IfwekeeptheC2MsegmentataBERof1x10-5orbetterfor100Gperlaneinterfaces,thenwecanreusethecd/bs PMDsasis◦ AssumingweweretokeepusingtheRS(544,514)FEC◦ Allowsforbothmoduleandhostbackwardscompatibility(seerightside)
Otherwisewecan’tdirectlysupporttheseexistingPMDs
NewHostDevice
NewModule
C2Cor
Retimer
C2M
LegacyModule
LegacyHostDevice
C2C
or
Retimer
C2M
C2M
BER<2.4x10-4BER<1x10-5 BER<1x10-5
544FEC
544FEC
C2M
100G/laneAUI 50G/laneAUI
16
Opt2:NewBERPartitioningfor802.3cd/bs PMDsWouldneedtoredefinethePMDBERrequirements(forexamplefor100GBASE-DR)
ThiswouldcreatenewPMDsdefinitions,notcompatiblewiththe802.3cd/bs PMD◦ Mightbepossibletoallowinteropatreducedlossforlegacymodules(mightaddconfusiontothemarket?)◦ Butmightnotbeabletodependingontheerrorfloor?
◦ Allowsforhostbackwardscompatibility(seerightside)
NewHostDevice
NewModule
C2Cor
Retimer
C2M
LegacyModule??
LegacyHostDevice
C2C
or
Retimer
C2M
C2M
LowerErrorratiothanaBERof2.4x10-4
HighererrorratiothanaBERof1x10-5
BER<1x10-5
KP4FEC
KP4FEC
100G/laneAUI 50G/laneAUIReducedDistance/Loss
17
Opt3:SegmentbySegmentFECThe100GperelectricallaneFECwillhavetobeterminatedinthemodule,sonodependenciesexist◦ Same(RS(544,514))ordifferentFECcouldbeusedforelectricalinterface◦ Addslatencytothespanandcomplexity/powertothemodules◦ Allowsforbothmoduleandhostbackwardscompatibility(seerightside)
HostDevice
NewModule
C2Cor
Retimer
C2M
LegacyModule
LegacyHostDevice
C2C
or
Retimer
C2M
C2M
BER<open BER<1x10-5
?FEC
?FEC
KP4FEC
KP4FEC
BER<2.4x10-4
100G/laneAUI 50G/laneAUI
18
Opt3:BERPartitioningfor802.3ba/bm PHYsIfthereisinterestinsupportingtheseolderPMDs,theFECmustbesegmentbysegment
The100GperlaneFECwillhavetobeterminatedinthemodule,sonodependenciesexist◦ Allowsforbothmoduleandhostbackwardscompatibility(seerightside)
HostDevice
NewModule
C2Cor
Retimer
C2M
LegacyModule
LegacyHostDevice
C2C
or
Retimer
C2M
C2M
NoFECorpostFECBER<1x10-12BER<open BER<1x10-12
?FEC
KR4FEC
ornoFEC
?FEC
KR4FEC
ornoFEC
100G/laneAUI 25G/laneAUI
19
HostDevice
Opt4:NewFECWrapperThisoptionaddssomeadditionalFEC(wrapper)tothedata◦ Allowsforbothmoduleandhostbackwardscompatibility(seerightside)◦ ThisadditionalFECispointtopointacrosstheAUIinterfaceandterminatedinthemodule◦ Willaddadditionallatency◦ Willincreasethedatarate◦ Addcomplexity/powertothemodule(andhostdevice)
NewModule
C2Cor
Retimer
C2M
LegacyModule
LegacyHost
Device
C2C
or
Retimer
C2M
C2M
BER<open BER<1x10-5
KP4FEC
KP4FEC
WrapFEC
WrapFEC
BER<2.4x10-4
100G/laneAUI 50G/laneAUI
20
ASICEscapeNext-generationASICswantto(atleast)doubletheirbandwidth◦ 50Gb/sgenerationhavelikelymaxedoutSERDEScountperdie/package
◦ Nonext-generationproductforsomearchitectureswithout100Gb/sSERDES
AretheASICSERDESuniversal(CR/KR/C2M/C2M)oraretheyjust(say)C2Candretimers providetheotherconnectivity?
100Gb/sASICscanuseretimers todownshiftto50Gb/sAUIs(etc)◦ Nofaceplatedensityincrease,butallowsforASICbandwidthgrowth.
22
CommercialExample:EthernetSwitch– 128x25GLanes– 30%ChipAreaisIO(shadedblue),70%forothermissionfunctions– Growsto>>30%in50GIOgeneration!
Keyrequirementfor100GgenerationIO:preservesiliconareaforvalueaddedmissionfunctions– MinimizePower– MinimizeArea(costanddevicefeasibility)– PCS&FEClogiccommonality(reusewherepossible)– “Balanced”serdes choice– area&powervs reach/capability
SwitchDieAreaIncreasinglyConsumedbyI/O
PackageDesign
LGA
BGA
LargerPackagesHigherpackagelossAdditionalCost
MaturetechnologyLowestCostPackagingCommercial
Products
30
40
50
60
70
80
90
100
110
0 10 20 30 40 50
PackageSize(m
m)
TotalIOBandwidth(Tb/s)
Drivenprimarilybyswitchpackageescape
PracticalBGAlimitis~256lanesina70mmpackage,1mmballpitch
Productsannouncedalreadywith50GIOwiththisformfactor(12.8T)
Switchprocessingcapacitydoublingevery~24mo
100GIOrequiredinproductsby~2019tocontinuethegrowthtrendataconstantrate!
Whenis100GelectricalI/Orequired?
LeadingSwitchCapacityvs Year
>2019ASICrequirementsareexpectedtoexceedBWdeliveredbyaconventionalBGAwith50G IO
10GIO
25GIO
50GIO
2019
100GIO
Non-retimedFeasibilityEstimatesInterface Architecture ApproximateChannelLoss(ball–
ball,dB)FeasibilityRank
ChiptoModule(assumes1RUsystem)
Conventional(10”PCB+Frontpanel module) 201 High
InternallyCabledHost+Front-panelmodule 152? High
Mid-board OpticalModule 10– 152 High
ChiptoChip
Conventional (PCB+MezzConnector) 401 Low
Internally Cabled+MezzConnector ~20– 30 Med
Backplane(KR)
1mConventionalPCB 601 VeryLow
Orthogonal 20– 402 Med
CabledBackplane 20– 352 Med
Copper Cable (CR)ConventionalDAC+PCB Host 601 VeryLow
InternallyCabled Host+DAC ~20- 30 Med
1– Projection- Scaled2xfrom802.3cdexistingchannels2- Source:NathanTracy,TEConnectivity,DesignCon 2017- CEI-112G:ConsideringElectricalChannels
• C2Mlookachievable
• ChiptoChip,Backplane,CopperCablesrequirearchitecturechange
Notsurprisingly,tallpolesappeartobeassociatedwiththelongerchannels
Moredataandstudyrequiredonbothchannels,aswellasserdes capability
Initial100G/laneelectricalapplicationsarelikelytobeassociatedwithswitchpackageescape,whereahighpower,largerareaserdes willbeprohibitive,andtimetostandardizationiscritical
Projectscopeimplications?
RightsizingtheIOstandardizationforswitchingapplications
BackplaneConsiderationsLineinthesandat30dBat28GHz- Pastprojecttargets- Needastartingpointtodiscuss
systemroutingneeds
Traditionallengthsstruggleatthisspeedstep.
PCBindustryhasmadefurtherprogressinthelast12months.
Slide/dataUsedwithpermissionfromNathanTracy,TEConnectivity.DatapresentedatDesignCon2017.
30
BackplaneConsiderationsTraditionalbackplane:◦ Needtoshortento~14-17”ofMeg6tomeet30dB
GivenroutingrequiredforonandoffLCs,Routabledistanceonbackplaneitselfistoolimited.
Traditionalbackplanearchitectureisn’tveryrealisticatthisspeed.
Approximatecalculations:◦ Holding4dBforconn+vias◦ 1.89dB/inà 13.75”◦ 1.485dB/inà 17.5”◦ Loss#sfromGoergen_nea_01_0517
31
BackplaneConsiderationsOrthogonalConnector◦ ShorteningDCtraces(onmoderatematerials)tocloserto6-8”fitsina30dBbudget
◦ Using8”flyovercablefitsina30dBbudget
Therearepathsforwardhere.
SimulationresultsusedwithpermissionbyNathanTracy,TEConnectivity.
6”DCtraceinlowcostFR4
2”trace+8”cable
32
BackplaneConsiderationsCabledBackplane◦ ShorteningDCtraces(onmoderatematerials)tocloserto4-6”fitsina30dBbudget
◦ Using8”flyovercablefitsina30dBbudget
Therearepathsforwardhere.
SimulationresultsusedwithpermissionbyNathanTracy,TEConnectivity.
4”PCBtraces1mcable
4”DCtraceinFR42”trace+8”cable
8”flyovercables1mcable
33
BackplaneSummaryDesignchoiceswillfactorinmorethaneverbefore:- Mediumused(PCBvs.cable)- Surfaceroughness- Vias &stubs- Groundlayerthickness- Radiatedemissionsshielding
Pure-backplanelinksdonothavetheburdenofneedingtosupportopticsandpassiveCucable◦ UniversalSERDESonASICsaredesirable(seenicholl_3bs_01c_0115*),butmaymakethingstoodifficult
Backplaneoptionsthatremainat30dBarestillusefulforsystems!*http://www.ieee802.org/3/bs/public/15_01/nicholl_3bs_01c_0115.pdf
34
RetimersAreausefultooltomakeconnectionsfunction◦ CanincreaseTFfortrickierlinks
Increasecostperbit,power,andtakeupboardsurfacearea◦ ThesehaveBMPandEFimplications
Needeverywhereorjustsparinglywhereneeded?◦ Ifweuseeverywhere,thenASICSERDEScanbesimpler
36
DiscussionWeshouldkeepinmindhowlongeachobjectiveisexpectedtotaketostandardize◦ Asubsetoftheobjectives(e.g.C2M)maybeeconomicallyworthsplittingofftofinishfaster
Needtogetconsensusonwhatusecasesshouldbesupported
37
Top Related