ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin...

25
ENLITENED Annual Program Review LEED – A Lightwave Energy Efficient Data Center October 31, 2019

Transcript of ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin...

Page 1: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

ENLITENED Annual Program Review

LEED – A Lightwave Energy Efficient Data Center

October 31, 2019

Page 2: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

LEED objectives

‣ A robust, scalable, energy-efficient network (ENLITENED Metric 1.1)

‣ Co-optimized across:– Network Architecture

• Efficient all-optical networks• Cost effective and fault tolerant

– Optical Switch• Decouples switching from routing• Based on laser-written “pinwheel”

– Commercially-viable enhanced link-margin interconnects

• Burst-mode APD receiver• WDM modulator array• Broadband mux/demux• Integrated Tx and Rx (Phase 2)

Project Objectives

2

25 Gb/s APD for

burst-mode Rx

Racked“Pinwheel”

Rotor Switch

Efficientall-optical network(Opera)

Page 3: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Phase 2 LEED team Team

‣ Networking– George Papen, George Porter, Alex Snoeren (UCSD)– Max Mellette (inFocus Networks) – Simon Hammond (Sandia National Labs)

‣ Optical Switch– Ilya Agurok, George Papen (UCSD)– Joseph Ford, Max Mellette (inFocus Networks)

‣ Interconnects– Shaya Fainman, Shayan Moookherjea (UCSD)

– Y. Ehrlichman, I. G. Yayla, J. Simons, J. E. Cunningham, A. V. Krishnamoorthy(Axalume)

– Michael Gehl, Christopher T. DeRose, Paul S. Davids, Douglas C. Trotter, Andrew L. Starbuck Christina M. Dallo, Dana Hood, Andrew Pomerene and Tony Lentine(Sandia National Labs)

– 3

Page 4: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

• For shuffle-bound applications:~2x bandwidth ® 2x higher energy efficiency

• For sort application:~3x bandwidth ® 2x higher energy efficiency

• “Bandwidth per buck” determines actual operating point

Solving the energy-efficiency challenge:Reduced bandwidth per buck via optical switching

4

Physically measured confirmation of fundamental LEED tenet: Different applications have different energy-efficiency slope vs bandwidth, but greater network bandwidth leads directly to higher energy efficiency!

2x energy efficiencyimprovement

3x energy efficiencyimprovement

Accomplishments

Page 5: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

5

Solving the control plane challenge:RotorNet (Sigcomm ’17):

Optical switchesR1

R2

RN

H

H

H

H

H

H

H

H

H

Time

Connection patterns implemented by each switch:

…+4 +(N-2)

Matching cycle

Racks

B

C

A +1

…+2 +5 +(N-1)

+1 …

+2 …

+3 ……+3 +6 +N

® Hosts decide when to send given a fixed schedule® Eliminates control complexity

Accomplishments

Page 6: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

6

Solving the latency challenge:Opera protocol (NSDI 2020)

Optical switchesR1

R2

RN

H

H

H

H

H

H

H

H

H

Time

…A2 AM

Matching cycle

Racks

B

C

A A1 A1 …

…B1 B2 BM B1 …

C1 ……C1 C2 CM

® Key insight: we can’t wait for reconfiguration; paths need to be “up” already.

® Other switches available while one reconfigures

Accomplishments

Page 7: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

B1 …

7

Optical switchesR1

R2

RN

H

H

H

H

H

H

H

H

H

Time

Racks

B

C

A A1

…C1 C2

B2

A2

• Need to choose matchings so the union of active matchings provides complete connectivity

• Needs to be true at any point in time, i.e. for all possible unions

® Other switches available while one reconfigures

AccomplishmentsSolving the latency challenge:Opera protocol (NSDI 2020)

Page 8: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Opera Performance:Out-performs hybrid RotorNet network

‣ An all-optical transport is more efficient than a hybrid packet/circuit network based on RotorNet

‣ Much more efficient than standard packet-switched network

Accomplishments

8

Page 9: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Solving the synchronization challenge:Open-source Corundum FGPA-based NIC‣ PCIe interface

– Provides high-performance (25 Gb/s)Direct Memory Access (DMA) engine

‣ Software driver– Connects software networking stack to FPGA- based NIC & circuits

‣ Scalable queue management/synchronization– 100 ns precision using PTP for TDMA (RotorNet/Opera)– 1000+ independent, hardware-managed queues!

9

HostQueue

NICDMAQueueQueue PCIe Q mgmt Network

Accomplishments

Source code:https://github.com/ucsdsysnet/corundum

Provides the “glue” between packet and circuit worlds

Page 10: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Architectures:Phase 1 accomplishments/ Phase 2 Objectives‣ Phase 1 accomplishments

– Control plane: Full stack demonstration of RotorNet• Can run unmodified Linux apps on optical network

– Latency: Developed and simulated Opera• Out-performs hybrid circuit/packet networks

– Synchronization: Corundum FGPA-based NIC• Key interface between circuit and packet worlds

‣ Phase 2 Objectives– Full-stack implementation of Opera– Realistic assessment of optical networks using

community-established benchmarking – Research into viable workloads for HPC/ datacenters

using benchmarks

Accomplishments/Objectives

10

Page 11: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Switching:Rotor switch status

11

Selector switch integrationPhysical format Rack-mounted vs “breadboard” specCrosstalk < 30 dB < 20db requirement)Operating spectrum > 120 nm > 35nm specOptical switching time 15 µs < 75 µs specSystem switching time 40 µs < 100 µs spec2-pass insertion loss 5 – 8 dB < 7 dB spec on average

Sect

or N

umbe

r

BER

Time ->

9 Servers

RotorSwitch

Connections (back)

Accomplishments

Page 12: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Prototype pinwheel in 3.5” HGST Deskstar NAS

Pinwheel with encoder, encoder tracks, and clear cover

12

Accomplishments

Page 13: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Rotor switch transmission

13

Power variations caused by combination of pointdefects and stitching errors during fabrication

2.2 dB

Doublepass rotor switch insertion loss5-8 dB overall, up to 2.2 dB on single output

Loss

in d

B

30 d

B

Doublepass crosstalk

Accomplishments

Page 14: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

01100001 01100110

01100001 0110011001100011 01000110

Switched-network BER measurements

Source TX Channel SinkRX

Corundum NICPRBS31 from a linear feedback chip register

COTS TX1310nm SMF, TXR, PSM-4

Corundum NICReads signal and calculate errors

COTS RX1310nm SMF TXR, PSM-4

Automated hardware platform measurement: PRBS generated in Corundum NIC (or FGPA board). Computer used only to control process and retrieve collected heat-map data.

RotorNet connection:switch / match / switch

(unamplified)

Accomplishments

14

Page 15: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Measured BER “heat map”

15

System-level switching time

System-level switching time includes:• Physical switching time (~22 us)• AGC and CDR lock time (~10 us)• Disk synchronization (~10 us)

Does not include:• Ethernet 64b/66b frame sync• NIC transmit timing accuracy

Sect

or N

umbe

r

Log BERSector Offset (µs)

Total 54sectors

Each heat map shows BER vs time for one input connection, through 54 sectors (3 configurations repeated 18 times) of one full disk rotation.

Accomplishments

Page 16: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Demo of full-stack optically-switched network running unmodified Linux app (iperf)

‣ Ran app “iperf” on a single path– Network measurement application– Unmodified TCP for network stack

16

Every third sector has low errors -sufficient to run app.

Accomplishments

‣ Structure in BER showed some paths through the rotor switch are usable (with few errors from pinwheel fab errors /power offset)

Zoomin

Page 17: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Switches:Phase 1 accomplishments/ Phase 2 Objectives‣Phase 1 accomplishments

– Design, built, and tested first Rotor switch• 2-pass insertion loss: 5 – 8 dB• Optical switching time 15 µs; System switching time 40 µs• Bandwidth > 120 nm; crosstalk < -30 dB

– Mitigation path for pinwheel fabrication issues • Stitching errors can be corrected by laser writing tool adjustments• Point defects corrected by reduced contamination/larger spot size

‣ Phase 2 Objectives– Develop manufacturable large port-count Rotor switch

• Replace fiber array with collimator array• Replace fiber patch panel w/micro-optics array

Accomplishments/Objectives

17

Page 18: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

‣ I. Optically-interconnected, electrical switching

ElectronicSwitch

~ 10 pJ/bit12.8 Tb/s IO

TxRxTxRx

TxRxTxRx

TxRxTxRx

TxRxTxRx

‣ Switch energy is relatively high

‣ Link metrics– 2 pJ/bit

‣ BW density– 1 Tb/s/cm(a)

OpticalSwitch

TxRx

TxRx

TxRx

TxRx

‣ II. Optically switched ‣ Switch energy is low‣ Switch loss is managed w/o amplifiers

– Link is optimized for margin– Link metrics(1.2)

1 pJ/bit excluding laser power+ 1 pJ/bit laser x excess switch loss= 2 pJ/bit for a lossless switch

– Link metric vs ~14 pJ/bit Case I

‣ Scales > 100 Tb/s w/WDM(b)

LEED Interconnect objectives

18

Project Objectives

Page 19: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

LEED interconnect Phase 2 projects‣ Integrated Transmitter

– UCSD: modulator/testing Forrest Valdez, Shayan Mookerjea– UCSD: laser source testing Suruj Deka, Shaya Fainmain– Axalume: source array/integration Y. Ehrlichman, I. G. Yayla, J.

Simons, J. E. Cunningham, A. V. Krishnamoorthy

‣ Integrated Receiver– UCSD: demux Jordon Davis, Shaya Fainman– Axalume: BM Rx front end same at Tx team– Sandia: demux/APD Michael Gehl, Doug Trotter, Andrew

Starbuck, Tina Dallo, Dana Hood, Andrew Pomerene, Paul Chris DeRose, Tony Lentine

‣ The rest of the LEED team members all have input to the interconnects (co-design)

19

Team

Page 20: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

25 Gb/s APDs for integrated BM Rx‣ Demonstrate an APD with a

responsivity of 10 A/W and a 3 dB bandwidth of 25 Gb/s.

– Link modeling shows ~ 3.5 A/W is needed to achieve 2 pJ/bit with the switch in the path.

20

‣ Results:– R=5.4 A/W and B=21.8 GHz,

satisfies 25 Gb/s at 2 pJ/bit– R=7.1 A/W and B=18.8 GHz,

satisfies 25 Gb/s at 2 pJ/bit– R=3.1 A/W and B=46.1 GHz,

potential for 50 Gb/s– R=1.7 A/W and B=71.4 GHz,

potential >> 50 Gb/s

Eye at 25 Gb/s from APD @ 3.5 A/W

Accomplishments

Page 21: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

High OMA WDM modulator array

OpticalFilter/Splitter

FPGA(BER/tuning)

ModulatorArray

AC ProbeArray

DC Probe(tuning)

High-speedphotodetector

WDM SourceArray

Measure Eye at 10 Gb/s

‣ Goal: Demonstration of an eight channel 25 Gb/s modulator array with comb source and closed loop control

‣ Challenge: close link w/o optical amp‣ Requires detailed device

modeling/characterization

Custom Picoprobe 40A

Optoplex 100 GHz 3-port ROADM

Xilinx VCU1x8

21

RF Amp (Optional)

Accomplishments

Page 22: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

22

QSF

P28

QSF

P28 FPGA

2-channel VOA(MM&SM)

2x1switch

(MM&SM)

Digitalswitchcontrol

Control host

RX

pow

er

Time

BER

Time

…TX 1TX 2 RX

P1

P2

Tslot

Tlock = f (P1,P2,Tslot,Df,Df)

Splitter1%

99%

Power meter

Goal: map the functional relationship of Tlock = f (P1,P2,Tslot,Df,Df) for various commercial and LEED-developed transceivers.

USBEthernet

USB&

Ethernet

Relative TX clock phase = Dfand frequency = Df

TTL

AccomplishmentsTransparent switching testbed

Page 23: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Transparent switching testbed operationAccomplishments

23

0 50 100 150-10

-5

0

log 10

(BER

)

(a) 0dB Power Offset

ChA ChB ChA

0 50 100 150Offset ( s)

-10

-5

0

log 10

(BER

)

(b) 5 dB Power Offset

ChA ChB ChA

(c) BER Heat Map

ChA ChB ChA

0 50 100 150Offset ( s)

-6

-4

-2

0

2

4

6

8

Pow

er O

ffset

(dB)

-8

-7

-6

-5

-4

-3

-2

log 10

(BER

)

Time-resolved BER measurement using two transmitters w/each from a separate PSM4 module

Page 24: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Interconnect:Phase 1 accomplishments/ Phase 2 Objectives‣ Phase 1 accomplishments

– Burst-mode Rx: key feedforward operation demonstrated– APD: 25 Gb/s @ 3.5 A/W demonstrated à2 pJ/bit link– Modulator Array: Coupled physics/device design process– Switched Links: Switched power offset identified as issue

‣ Phase 2 Objectives– Integrated Tx

• Combines modulator work with source array and control– Integrated Rx

• Combines mux/demux, APD, and BM frontend– Goal: Co-optimize photonic Tx/Rx chips to close link w/o

amplification & mitigate switched power offset

Accomplishments/Objectives

24

Page 25: ENLITENED Annual Program Review LEED – A Lightwave Energy ... · – Link is optimized for margin – Link metrics(1.2) 1 pJ/bit excluding laser power +1 pJ/bit laser x excess switch

Conclusions

‣ Phase 1 of LEED developed the key technologies (network architecture/protocol, switch, and interconnect) for a practical, cost-effective energy-efficient optical network for datacenters and HPC

‣ Phase 2 will focus on the manufacturable integration of those components to demonstrate a viable use case for energy-efficient optical networking

– Demonstrate Opera w/viable use cases– Manufacturable low-loss large port count Rotor switch– Integrated Tx and Rx that can close a link w/o optical

amplification & address transient power offset issues

25