AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for... · AltiumLive 2017:...

Post on 12-Aug-2020

1 views 0 download

Transcript of AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for... · AltiumLive 2017:...

AltiumLive 2017:PCBs for Computing Density From Big Bang to the Automobile

Andreas DoeringIBM Research – Zurich Laboratory

1

Motivation for Microservers

1

Insights

Outlook

The DOME project

Boards

2

3

4

5

Agenda

2

* IDC HPC technology excellence award, ISC17

3

DOMEppp Astron, IBM, Dutch

gvt

Ronald P. Luijten / July 2017•4

SKA (Square Kilometer Array) to measure Big Bang

Picture source: NZZ march 2014

0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years

Big Bang Inflatio

n

Protonscreated

Start of nucleosynthesi

s through fusion

End of nucleo-

synthesisModern

Universe

•5

SKA: What is it?

Top 500: Sum=123 PFlops. 2GFlops/watt. 100x Flops of Sum! ~ 7GWh

~3000 Dishes3GHz-10GHz.

~0.5M Antennae.5GHz-1.7GHz.

~0.5M Antennae.07GHz-0.45GHz.

1. 109 samples/second * .5M antennae: .5 1015 samples/sec.

2. 3.5 109 samples/second * .5M antennae: 1.7 1015 samples/sec.

3. 2 1010 samples/second * 3K antennae: 6.1013 samples/sec

Sum = 2 1015 samples/second @ 86400 seconds/day:

170 1018 (Exa) samples/day. Assume 10-12x reduction @antenna:

14 Exabytes/day (minimum).

•6

© 2016 IBM Corporation

~ 10 Pb/s

86’400 sec/day

14 ExaByte/day

?

~ 1 PB/Day.

330 disks/day

120’000 disks/yr?

Top-500 Supercomputing(11/2013)…. 0.3Watt/Gflop/sToday’s industry focus is 1 Eflop @ 20MW. (2018)( 0.02 Gflop/s)

Most recent data from SKA:CSP….max. power 7.5MWSDP….max. power 1 MWLatest need for SKA – 4 Exaflop (SKA1 - Mid) 1.2GW…80MW

Too easy (for us)

Too hard

Moore’s lawFactor 80-1200

SDPCSP

multiple breakthroughs needed•7

Dome Project:

System Analysis

Data & Streaming

Sustainable (Green)

ComputingNanophotonics

Computing Transport Storage

Algorithms & Machines

- Nanophotonics- Real-Time Communications

- New Algorithms

- Microservers- Accelerators

- Access Patterns

Research Streams…

…are mapped to research projects:

…plus an open user platform:User platform

- Student projects

- Events- Research Collaboration

33M€ 5-year Research Project: 76 IBM PY (32 in NL); 50 ASTRON PY •8

Definitions

9

• “Microserver” = The server class of the mobile era

• “Microserver” = SoC + DRAM + Flash + Power

• “Microserver” = Backplane + not-enclosed modules

Motivation

10

• Silicon scaling limits, Energy for computation vs. on-chip-

communication vs. off-chip communication

• Use of large SMP-servers by partitioning, docker, etc.: Cache

Coherency not fully used

• Emergence of powerful embedded processor cores, in particular

ARM

• Premise given through Aquasar cooling work

enabled DOME funding

Table of PCBs

11

A= Altium Designer, C = Cadence

Module Name Iterations Length [mm]

Width [mm]

Thickness [mm] Layers Holes Components Nets Backdrilling Material Tool

P5020/P5040 processor 3 139.7 55.5 1.28 10 3242 1007 539 no ISOLA-400 A

Big Baseboard 1 220 160 1.28 10 491 175 154 no ISOLA-400 APower Converter 2 139 56.5 1.63 8 737 440 231 no FR-4 AmSATA on DIMM 2 139.7 55.5 1.24 4 341 69 67 no FR-4 A8p1 backplane 2 300 200 2.7 18 3582 565 1326 no FR-4+ ATestboard for switch power converter 1 160 220 1.6 8 888 259 134 no FR-4 ASwitch Mothercard >1 139.7 57.8 3.6 28 3311 837 730 yes ASwitch Daughtercard 1 139.7 57.8 1.8 10 423 213 160 no FR-4 AMini baseboard 2 160 100 1.2 6 851 376 241 no FR-4 ABracket for DIMM connector on Minibaseboard 2 154 32 1.2 6 116 2 98 no FR-4 ABracket for SPD08 connector on Minibaseboard 1 CT4240 processor 3+1 139.7 63 1.6 16 1316 820 Panasonic CmSATA on SPD08 3 139.7 62.5 1.6 6 1014 79 105 no FR-4 AM2 carrier 2 139.7 61.6 1.6 6 1091 130 131 AAuxiliary power converter 1 61 56 1.6 4 478 74 30 no FR-4 APCIe Extender no FR-4 ALS2088 Processor module 1? 139.7 62.5 1.6 14 1037 714 no Panasonic R1577/1570 CUSB HUB Module 2 139.7 61.5 1.57 8 1162 557 387 no FR-4 ABB2 backplane 2 520 200 3.15 22 12598 1076 3820 7 Runs Panasonic Megtron 6 A

Interposer card 1 139.7 80 1.57 8 897 76 132 4 Runs FR-4, Panasonic Megtron 6N A

FMKU2595 FPGA 2 139.7 63 1.57 14 7442 881 914 no Panasonic Megtron 6 A

System Overview

12

8/32/128 compute nodes

10G Ethernet Switch

storage node

Power converter

P5020/P5040 2/4 cores PowerPC-64@2.1GHz, 16GByte DDR3, 2xXAUI,4x1GbE, 2xSATAv1

T4240 24 cores PowerPC-64@1.8GHz, 24GByte DDR3, 4x10GbE, 2x1Gb, 2x SATAv2, PCIe-2.0 x8

LS2088 8xARMv8@2GHz, 32GByte DDR4, 6x10GbE, PCIe, 2xSATA

FMKU2595 FPGA 330KLUTs, 4x10GbE, 4xGbE,2xSATA

8 x mSATAor2xM2

8x40GbEthernet

DIMM socket with removed latches for generation 1

3M’s SPD08 in various lengths For generation 2

Xtreme Poweredge for power converter (both)

3 segments ofMolex Impact210 contacts(70 diff pairs)

Backplane connectors

13

System today

14

Backplane for • 32 compute nodes,

• 8 populated

• 1 Switch node,

• 1 Management node

• 2 Storage nodes

• Water cooled

View from above

15

Server nodes

Power node

Storage node

10 GbE Switch

QSFP cages

Water In/Out

Cooling Rails

System Q4 2017

16

Two backplanes,

total 64 compute

Nodes,

e.g.

1536 cores,

1536 GB DRAM

64 SSDs

Gallery of (some) Boards

17

Power Converter

18

• Master thesis project:

• Student did high-level design (e.g. selection of backplane connector),component selection, and schematic entry. Layout was completed by regular engineer: First version worked,

• 1 iteration to improve stability, protection

Challenges: High current on top/bottom and SMD packages, location of connectors, and tight IC/L/C-converter triangle, conflict ofhigh profile Ls and hot ICs that must be covered by cool plate

40A per contact finger, allowing different type of C/L

19

Switch Module

20

Left:Main SwitchPCB130mm x 55mm

Right:Switch with

mounted daughtercard

Pin Assignment

21

• Pin Assignment has to suit back plane and switch module design

• Both are challenging (Back plane has more space, but many more wires)

• Reduce crossing on both boards

• XAUI has low requirements on length balancing

• 1st Iteration:

• Let the CAD tool choose the pinout on both boards independently

• Find out the critical spots

• Use python script to build systematic pinout that circumvents these

PCB Layer Stack

22

6 inner signal layers, impedance controlledwith shieldingground layers in-between

4 high-currentpower supply lanes

Total PCB thickness

3.6mm

Length of connector

pins 1.2mm

Original Assumption,

that board space

across “through-hole”

connector cannot be

used, was wrong.

Need backdrilling

Press-Fit Connector on this side

ASIC on this side

PCB routing

23

This narrow strip (1cm wide) is one critical part.Routing between connector pins with 1 signal pair

FPGA Node

24

PCI- and/or Network-Attached2 Channels DDR4 (e.g. 16GByte)

Xilinx® Kintex® UltraScale

6 x 10 GBE, PCIe3 x8, 2 x SATA3

Status: In bringup

FPGA Node – Layout Concept

25

Flyby control signals

on 3 Layers,

P2P data signals mainly

on 1 layer

HighSpeed IO on

2 inner layers

Cooling

26

Combination of passive cooling on decapped chip, using vapor chambers and hot-water

Insights

27

• Main source of error: transfer from data sheet into tool

• Second source of error: Harness interface (swapping P/N on diff pairs, clock/data on I2C)

• Third source of error: voltage levels of pins (e.g. enable of power converter)

• Why is there no electronic transfer of component data to designers?

Exception: TI (e.g. https://webench.ti.com/cad/)

Why is there no standard format? There was an initiative XMLEDA, etc.

• DRC could do more, if symbols provided the information (e.g. P/N property, clock, etc.)

• Conversion from one tool to another is a кошмар

Hired Elgris and still 5 working days turned into 2 months

Acknowledgements

28

This work is the results of many people• Ronald Luijten (Lead Architect/Technical Lead), Francois Abel (Switch, FPGA. and BB2-lead), Beat Weiss

(Core Engineering), Matteo Cossale (Cooling), Stephan Paredes (Coooling), and others: IBM ZRL/CH

• Peter v. Ackeren,, Ed Swarthout, Dac Pham : Freescale/NXP

• Yvonne Chan, IBM Toronto

• Gijs Schonderbeek, Sieds Damstra, Albert-Jan Boobstra: ASTRON/NL

• Several students and interns

• And many more remain unnamed….

Companies: NXP; IBM; TransferDSW – NL, Strukton/NL, Roneda/BE, AT&S/AT,Supercomputing Systems/CH, Miromico/CH Dutch Gvt for DOME grant

Outlook

29

• Still work to be done, HW testing, SW, redesign of some boards for bugs or low

production yield, cost reduction of some components

• Commercially available through startup ILA Microservers

• First customer bought 15 T4240 modules

• Buildup of two systems for ASTRON and ZRL (with enclosure, etc.)

• GPU node

• Target markets:• Data center

• Scientific computing (SKA)

• Embedded (vehicles, robots, IoT Edge server)

Backup

System Management

31

• Every node is a USB device

• Cypress PSoC controller implements module-level management• Serial console

• Power Sequencing

• Current and Temperature Monitoring

• JTAG

• etc.

• Python process on host allows access of all hosts

• Implements IPMI

• Interacts with Switch, FPGA tools, etc.

QorlQ T4240 Communication Processor

32

32-way carrier network topology

33Ronald P. Luijten / July 2017 33

T4240module

32 way carrier

FM6000 switch

32x 10 GbE internal connectivity from switch8 x 40GbE external connectivity (QSFP+)Green links optionally connect to other 32way carrier

Thanks for your Attention!Questions?